Okay, it’s not blogging, but it took me forever to track down how to do this, so I figured I’d post it for others to use.

The setting: An ecommerce site.
The need: Searching for items in the inventory, but returning the same result whether the search terms are singular or plural, without messing up an exact phrase.
The solution: create two sets of keywords, singular and plural and search both literal and for the individual words via a regular expression.

I’m sure there are more efficient ways of doing this, but I couldn’t find any, and this is what I cam up with.

//$get_search is the search keyword(s) submitted via a
//  GET and filtered to block any hacking attempts

// $catSort is the sort criteria 'relevance',
//    'price:lo to hi', 'price:ho to lo', etc.
  $catSort=$_GET['catSort'];
  if(!$catSort){$catSort='relevance'; $order=' ORDER BY `rel` DESC';}
  $itemsPerPage=$_GET['itemsPerPage'];
  if(!$itemsPerPage){$itemsPerPage='20';}
  if($catSort=='l2h'){$order=' ORDER BY `price` ASC';}
  if($catSort=='h2l'){$order=' ORDER BY `price` DESC';}
  if($catSort=='item'){$order=' ORDER BY `productname`';}

//clean up the keyword some
  $get_search=rtrim(ltrim(urldecode($get_search

//  this array helps converts the key word (or last keyword in a phrase) to the plural form
$plural_rules = array(
'/(x|ch|ss|sh)$/' => '\1es',            # search, switch, fix, box, process, address
'/series$/' => '\1series',              #series is the same
'/([^aeiouy]|qu)y$/' => '\1ies',        # query, ability, agency
'/(?:([^f])fe|([lr])f)$/' => '\1\2ves', # half, safe, wife
'/sis$/' => 'ses',                      # basis, diagnosis
'/ex$/' => 'ices',                      # index
'/([ti])um$/' => '\1a',                 # datum, medium
'/person$/' => 'people',                # person, salesperson
'/man$/' => 'men',                      # man, woman, spokesman
'/foot$/' => 'feet',
'/louse$/' => 'lice',
'/mouse$/' => 'mice',
'/tooth$/' => 'teeth',
'/child$/' => 'children',
'/([nti])on$/' => 'a',                  # criterion
'/(.*)status$/' => '\1statuses',
'/s$/' => 's',                          # no change (compatibility)
'/$/' => 's'                            # everything else
);

//  this array helps converts the key word (or last keyword in a phrase) to the singular form
$singular_rules = array(
'/(xes|ches|sses|shes)$/e' =>           # the /e allows an executable phrase within the replacement
    "str_replace('es','',$1)",          # boxes, classes
'/series$/' => 'series',
'/([^aeiouy]|qu)ies$/' => '\1y',
'/([lr])ves$/' => '\1\2f',              # halves, scarves
'/([^f])ves$/' => '\1fe',               # wives
'/ses$/' => 'sis',                      # bases, diagnoses
'/ices$/' => 'ex',                      # indices
'/([ti])a$/' => '\1um',                 # data, media
'/people$/' => 'person',                # people, salespeople
'/men$/' => 'man',                      # men, women, spokesmen
'/feet$/' => 'foot',
'/lice$/' => 'louse',
'/mice$/' => 'louse',
'/teeth$/' => 'tooth',
'/children$/' => 'child',
'/([nti])a$/' => 'on',                  # criteria
'/(.*)statuses$/' => '\1status',
'/[aious]s$/' => '',                    # no change (compatibility)
'/s$/' => ''                            # everything else that ends in an 's'
);

$string=$get_search;
$match=false;
foreach($plural_rules as $pattern => $replacement)
  {
  if(!$match and preg_match($pattern, $string))
    {
    $out=preg_replace($pattern, $replacement, $string);
    $match=true;
    }
  }
if($out and $out!=$get_search){$plural=$out;}
else{$plural=$get_search;}

$match=false;$out='';
foreach($singular_rules as $pattern => $replacement)
  {
  if(!$match and preg_match($pattern, $string))
    {
    $out=preg_replace($pattern, $replacement, $string);
    $match=true;
    }
  }
if($out and $out!=$get_search){$singular=$out;}
else{$singular=$get_search;}
}

// the regexp will crash if we send it an empty
//   variable. So we do this
if(!$get_search){$get_search='[[[]]]';}

//I typical throw in a capital letter to distinguish
//  arrays from variables
$get_Search=explode(' ',$get_search);

//Yes, I know there's an easier way to weed out repeated
//   spaces, but I was in a hurry
$Temp='';
foreach($get_Search as $el)
  {
  if($el){$Temp[]=$el;}
  }
$get_Search=$Temp;

//we include all the keywords for the regexp part
$regexp=implode('|',$get_Search);

//these cell names are samples
// the first part (with all the IFs) allows us to create a
//   relevance score, customized to weight the individal
//   cells searched. We place a lot of weight on an exact
//   name match. and we do it once for the singular form
//   and once for the plural form and once for the regexp.
// $order we set previously so the search can be ordered
//   by price, item name or relevance
  $queryText = sprintf("
SELECT `productid`,
 IF(`productname` LIKE '%%%s%%',20,0)
+IF(`description` LIKE '%%%s%%',4,0)
+IF(`manufacturer` LIKE '%%%s%%',1,0)
+IF(`productcode` LIKE '%%%s%%',1,0)
+IF(`productname` LIKE '%%%s%%',20,0)
+IF(`description` LIKE '%%%s%%',4,0)
+IF(`manufacturer` LIKE '%%%s%%',1,0)
+IF(`productcode` LIKE '%%%s%%',1,0)
+IF(`productname` REGEXP '[[:<:]]%s[[:>:]]' ,2,0)
+IF(`description` REGEXP '[[:<:]]%s[[:>:]]' ,2,0)
+IF(`manufacturer` REGEXP '[[:<:]]%s[[:>:]]' ,1,0)
+IF(`productcode` REGEXP '[[:<:]]%s[[:>:]]' ,1,0) AS `rel`
FROM `ecommerce_products`
WHERE (`productname` LIKE '%%%s%%'
   OR `description` LIKE '%%%s%%'
   OR `manufacturer` LIKE '%%%s%%'
   OR `productcode` LIKE '%%%s%%'
   OR `productname` LIKE '%%%s%%'
   OR `description` LIKE '%%%s%%'
   OR `manufacturer` LIKE '%%%s%%'
   OR `productcode` LIKE '%%%s%%'
   OR `productname` REGEXP '[[:<:]]%s[[:>:]]'
   OR `description` REGEXP '[[:<:]]%s[[:>:]]'
   OR `manufacturer` REGEXP '[[:<:]]%s[[:>:]]'
   OR `productcode` REGEXP '[[:<:]]%s[[:>:]]') $order;
",
         mysql_real_escape_string($plural),
         mysql_real_escape_string($plural),
         mysql_real_escape_string($plural),
         mysql_real_escape_string($plural),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($regexp),
         mysql_real_escape_string($regexp),
         mysql_real_escape_string($regexp),
         mysql_real_escape_string($regexp),
         mysql_real_escape_string($plural),
         mysql_real_escape_string($plural),
         mysql_real_escape_string($plural),
         mysql_real_escape_string($plural),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($singular),
         mysql_real_escape_string($regexp),
         mysql_real_escape_string($regexp),
         mysql_real_escape_string($regexp),
         mysql_real_escape_string($regexp));

// I use a db link variable to allow error messages
  $query=mysql_query($queryText, $link);
  if(mysql_errno($link)){echo ": " . mysql_error($link) . "\n<hr>";}

  if(mysql_num_rows($query))
    {
    while ($dbRow = mysql_fetch_array($query))
      {
      $id=$dbRow[0];
// this is to weed out duplicates
      $return[$id]++;
      }

// this creates a nice array of product ids for us to work with.
// using only the product ids we can begin with the entire list,
//   without over burdening the server
// from here we simply grab the ones we need (based on the
//    start # and pagelength #)
// using the product id, I use a function that then grabs
//    all the pertinent data about that item.
  foreach($return as $key=>$value)
    {
    $prodList[]=$key;
    }

Comments are closed.

About me

I've been developing web sites for over 12 years. I started with HTML, moved on to Perl and now do mostly PHP with a lot of MySQL and Javascript as well.

The purpose of this blog is to write about many of the simpler scripting solutions bloggers are either unaware of or unable to implement. Hopefully I'll have something you can use

Danny Carlton

Advertising
Advertising

Switch to our mobile site