Archive for the ‘MySQL’ Category
Okay, it’s not blogging, but it took me forever to track down how to do this, so I figured I’d post it for others to use.
The setting: An ecommerce site.
The need: Searching for items in the inventory, but returning the same result whether the search terms are singular or plural, without messing up an exact phrase.
The solution: create two sets of keywords, singular and plural and search both literal and for the individual words via a regular expression.
I’m sure there are more efficient ways of doing this, but I couldn’t find any, and this is what I cam up with.
//$get_search is the search keyword(s) submitted via a
// GET and filtered to block any hacking attempts
// $catSort is the sort criteria 'relevance',
// 'price:lo to hi', 'price:ho to lo', etc.
$catSort=$_GET['catSort'];
if(!$catSort){$catSort='relevance'; $order=' ORDER BY `rel` DESC';}
$itemsPerPage=$_GET['itemsPerPage'];
if(!$itemsPerPage){$itemsPerPage='20';}
if($catSort=='l2h'){$order=' ORDER BY `price` ASC';}
if($catSort=='h2l'){$order=' ORDER BY `price` DESC';}
if($catSort=='item'){$order=' ORDER BY `productname`';}
//clean up the keyword some
$get_search=rtrim(ltrim(urldecode($get_search
// this array helps converts the key word (or last keyword in a phrase) to the plural form
$plural_rules = array(
'/(x|ch|ss|sh)$/' => '\1es', # search, switch, fix, box, process, address
'/series$/' => '\1series', #series is the same
'/([^aeiouy]|qu)y$/' => '\1ies', # query, ability, agency
'/(?:([^f])fe|([lr])f)$/' => '\1\2ves', # half, safe, wife
'/sis$/' => 'ses', # basis, diagnosis
'/ex$/' => 'ices', # index
'/([ti])um$/' => '\1a', # datum, medium
'/person$/' => 'people', # person, salesperson
'/man$/' => 'men', # man, woman, spokesman
'/foot$/' => 'feet',
'/louse$/' => 'lice',
'/mouse$/' => 'mice',
'/tooth$/' => 'teeth',
'/child$/' => 'children',
'/([nti])on$/' => 'a', # criterion
'/(.*)status$/' => '\1statuses',
'/s$/' => 's', # no change (compatibility)
'/$/' => 's' # everything else
);
// this array helps converts the key word (or last keyword in a phrase) to the singular form
$singular_rules = array(
'/(xes|ches|sses|shes)$/e' => # the /e allows an executable phrase within the replacement
"str_replace('es','',$1)", # boxes, classes
'/series$/' => 'series',
'/([^aeiouy]|qu)ies$/' => '\1y',
'/([lr])ves$/' => '\1\2f', # halves, scarves
'/([^f])ves$/' => '\1fe', # wives
'/ses$/' => 'sis', # bases, diagnoses
'/ices$/' => 'ex', # indices
'/([ti])a$/' => '\1um', # data, media
'/people$/' => 'person', # people, salespeople
'/men$/' => 'man', # men, women, spokesmen
'/feet$/' => 'foot',
'/lice$/' => 'louse',
'/mice$/' => 'louse',
'/teeth$/' => 'tooth',
'/children$/' => 'child',
'/([nti])a$/' => 'on', # criteria
'/(.*)statuses$/' => '\1status',
'/[aious]s$/' => '', # no change (compatibility)
'/s$/' => '' # everything else that ends in an 's'
);
$string=$get_search;
$match=false;
foreach($plural_rules as $pattern => $replacement)
{
if(!$match and preg_match($pattern, $string))
{
$out=preg_replace($pattern, $replacement, $string);
$match=true;
}
}
if($out and $out!=$get_search){$plural=$out;}
else{$plural=$get_search;}
$match=false;$out='';
foreach($singular_rules as $pattern => $replacement)
{
if(!$match and preg_match($pattern, $string))
{
$out=preg_replace($pattern, $replacement, $string);
$match=true;
}
}
if($out and $out!=$get_search){$singular=$out;}
else{$singular=$get_search;}
}
// the regexp will crash if we send it an empty
// variable. So we do this
if(!$get_search){$get_search='[[[]]]';}
//I typical throw in a capital letter to distinguish
// arrays from variables
$get_Search=explode(' ',$get_search);
//Yes, I know there's an easier way to weed out repeated
// spaces, but I was in a hurry
$Temp='';
foreach($get_Search as $el)
{
if($el){$Temp[]=$el;}
}
$get_Search=$Temp;
//we include all the keywords for the regexp part
$regexp=implode('|',$get_Search);
//these cell names are samples
// the first part (with all the IFs) allows us to create a
// relevance score, customized to weight the individal
// cells searched. We place a lot of weight on an exact
// name match. and we do it once for the singular form
// and once for the plural form and once for the regexp.
// $order we set previously so the search can be ordered
// by price, item name or relevance
$queryText = sprintf("
SELECT `productid`,
IF(`productname` LIKE '%%%s%%',20,0)
+IF(`description` LIKE '%%%s%%',4,0)
+IF(`manufacturer` LIKE '%%%s%%',1,0)
+IF(`productcode` LIKE '%%%s%%',1,0)
+IF(`productname` LIKE '%%%s%%',20,0)
+IF(`description` LIKE '%%%s%%',4,0)
+IF(`manufacturer` LIKE '%%%s%%',1,0)
+IF(`productcode` LIKE '%%%s%%',1,0)
+IF(`productname` REGEXP '[[:<:]]%s[[:>:]]' ,2,0)
+IF(`description` REGEXP '[[:<:]]%s[[:>:]]' ,2,0)
+IF(`manufacturer` REGEXP '[[:<:]]%s[[:>:]]' ,1,0)
+IF(`productcode` REGEXP '[[:<:]]%s[[:>:]]' ,1,0) AS `rel`
FROM `ecommerce_products`
WHERE (`productname` LIKE '%%%s%%'
OR `description` LIKE '%%%s%%'
OR `manufacturer` LIKE '%%%s%%'
OR `productcode` LIKE '%%%s%%'
OR `productname` LIKE '%%%s%%'
OR `description` LIKE '%%%s%%'
OR `manufacturer` LIKE '%%%s%%'
OR `productcode` LIKE '%%%s%%'
OR `productname` REGEXP '[[:<:]]%s[[:>:]]'
OR `description` REGEXP '[[:<:]]%s[[:>:]]'
OR `manufacturer` REGEXP '[[:<:]]%s[[:>:]]'
OR `productcode` REGEXP '[[:<:]]%s[[:>:]]') $order;
",
mysql_real_escape_string($plural),
mysql_real_escape_string($plural),
mysql_real_escape_string($plural),
mysql_real_escape_string($plural),
mysql_real_escape_string($singular),
mysql_real_escape_string($singular),
mysql_real_escape_string($singular),
mysql_real_escape_string($singular),
mysql_real_escape_string($regexp),
mysql_real_escape_string($regexp),
mysql_real_escape_string($regexp),
mysql_real_escape_string($regexp),
mysql_real_escape_string($plural),
mysql_real_escape_string($plural),
mysql_real_escape_string($plural),
mysql_real_escape_string($plural),
mysql_real_escape_string($singular),
mysql_real_escape_string($singular),
mysql_real_escape_string($singular),
mysql_real_escape_string($singular),
mysql_real_escape_string($regexp),
mysql_real_escape_string($regexp),
mysql_real_escape_string($regexp),
mysql_real_escape_string($regexp));
// I use a db link variable to allow error messages
$query=mysql_query($queryText, $link);
if(mysql_errno($link)){echo ": " . mysql_error($link) . "\n<hr>";}
if(mysql_num_rows($query))
{
while ($dbRow = mysql_fetch_array($query))
{
$id=$dbRow[0];
// this is to weed out duplicates
$return[$id]++;
}
// this creates a nice array of product ids for us to work with.
// using only the product ids we can begin with the entire list,
// without over burdening the server
// from here we simply grab the ones we need (based on the
// start # and pagelength #)
// using the product id, I use a function that then grabs
// all the pertinent data about that item.
foreach($return as $key=>$value)
{
$prodList[]=$key;
}

I've
been developing web sites for over 12 years. I started with HTML, moved on
to Perl and now do mostly PHP with a lot of MySQL and Javascript as well.