Internal Website Search Engine. How To Deal With Misspellings
Large (eCommerce) websites are stacked with information, from product descriptions to articles or blog posts. Making them available for search engines like Google or Bing is one thing (called SEO btw), but what about internal search. Many websites offer a search bar however most of the time the results are just awful.
How do large websites deal with this issue and how do you build a good internal search engine. Well, it’s both difficult and time-consuming. If you want to build anything useful just searching your product or article tables for user input won’t cut it.
Dealing with misspellings is one of the area’s important to deal with if you want to build a good search engine.
Misspellings
Let’s get to it. To start we need to create a table containing common misspellings and point them to the correct product, product category, article or whatever it is your website is about. This can really benefit your users and is very easy to implement. You can find common misspellings by looking at all your internal website queries. For example, filter out search queries which occured more then once and which returned zero results.
The next step would be to find products similar to the user’s query. E.g a user searches for a product with the part number XY12390AB, your product database does not contain this particular product but it does contain a product with the part number XY12380AB and a product with the part number XY12390AN. Both are very similar to the user’s query but not an exact match. It could be a mistype. To find these ‘close’ matches you can use an algorithm called the Levenshtein distance calculator. This algorithm returns the similarity between two words or phrases or part numbers in this case. Running such an algorithm over each product in a large product database would be too slow. To limit the products needed to check you could only check products with a part number starting with XY1 or ending with 0AB.
As a third step, you could replace all ambiguous characters with a wild card. I would advise replacing the following characters and numbers with a wildcard when searching your database tables:
- B = 8
- I = 1
- l = l
- O = 0
- o = 0
- G = 6 (optional)
- S = 5 (optional)
- Z = 2 (optional)
Combining the three steps above would greatly improve your internal website product search. It is important to note that combining the methods offers the greatest benefit. Although you could use only one or two methods.