Search This Blog

Friday, March 18, 2011

DBSight - combining levenshtin distance and double metaphone

My attempt to make "lastname" search friendly to users.

Decision: Use DBSight, utilize multiple analyzers: numberOrLowerCase, Double Metaphone.

Problem:
People don't spell names correctly all the time, for example, I'm trying to search for 'picasso', but i decided to misspell it as for 'picaso' (with a missing s). From the search result, 'Bachs' alone with a bunch of other names showed up with the same score as 'picasso'. But to have 'picasso' show up on the top of the list is what I want to achieve. So, I'm thinking, if i can put levenshtin distance in there and assign it with a higher score, then i might be able to solve this problem.(configure search --> searchable columns)




Solution:
lastNameT is for text search, so when i type in "picasso" the correct spelling, the exact spelling  shows up on the top. This field is also use for Stemming or Levenshtin distance, so that words that spells similarly will weight more than the phonetic matches. if i only use phonetic, names with similar sounds will show up with the same score.
lastNameP is the phonetic one.


so here is my setup:

lastNameT, lastNameP all hold the same value, and they are only differ in analyzer and weight. to achieve this, i did the following.

1. getData
Data Source->select data->sql

----------------------------------------
select id, 
lastName as lastNameT,
lastName as lastNameP
from mytable


------------------------------------------------------

1.2 i have my id as primary key and and the names are text fieldTypes


2. adjust analyzer
Data source -> language
change the analyzer setting for lastNameT, lastNameP to numberOrLowerCase, Double Metaphone respectively.


3. enable spellcheck (it's really nice to have)
Data source -> spell check
check the checkbox on lastNameT
check the checkbox on "use index-specific" spell checking. (using regular dictionary to correct my spelling of a person's last name wouldn't make too much sense since i would only be interested in what my database has to offer)


4. adjust weight (depending on needs)
fieldName       type         FieldType        Analyzer                       weight
lastNameT     String           text          numberOrLowerCase            2.0
lastNameP     String           text          Double Metaphone               1.0

5. enable wild card search(optional)
 configure search -> wild-card
this part is very self explanatory.

make a template and when you type in 'picaso' in the search box, hit enter and then add &lq=lastnameT:picaso~0.4 to the end of the url.

there u have it! picasso on top of the list!

because i don't want to modify the URL at the address bar every time when i do a search, i did the following modification to the template so that it captures the lq parameter and fires it off with the form submit.

6.  create and modify display template.
  display template -> create from scaffold 
 6.2 create a display template.(i used client side sortable table)
 6.3 modify template.
display template -> list
      click on the template name and locate searchBox.ftl and add the following java script to the end of the file.
--------------------------------------------------------
<script>
function populateLQ() {
  var newtext = document.search_demo.q.value;
    document.search_demo.lq.value ='lastNameT:'+newtext+'~0.4';
}
</script>
--------------------------------------------------------
Note1: line 3:document.search_demo.q.value must be modified to match the name of your index. eg. document.search_myNameIndex.q.value;
Note2: notice the ~0.4, that's what i use, because i think 0.4 is close enough.

add a hidden field
<input type="hidden" name="lq" id="lq" value="">

modify the input text field of search box
From
<input type="text" name="q" id="q" size="41" maxlength="2048" value="${searchResult.userInput?html}">&nbsp;
To:
<input type="text" name="q" id="q" size="41" maxlength="2048" value="${searchResult.userInput?html}" onchange="populateLQ();">&nbsp;

this way the input also modifies the hidden field that will be passed for search.

Done!

After thoughts:
I later created another index that searches the combination of first name and last name by concatenating the first name and last name separated by a space and added auto complete from partial scaffold->"search suggestion". which is much more user friendly.

I'm sure there are many other ways of doing this and much better ways too. So please share your criticisms, thoughts or what ever you have in mind.

thanks Paul, Will. (DBSight is awesome!)

Check out their wiki for more info

http://wiki.dbsight.com/index.php?title=Exact_Search_plus_fuzzy_double_Metaphone_search