Elastic Search

Been trying out Elastic search recently. Used Solr before but not Elastic Search.

The original search was producing spotty results for some terms it was fine for other terms it was a useless, it was a black box so wasn’t particularly configurable, it also took a long time and errored/locked up whilst it was reindexing. In addition the site it was working on is a bit niche and available in several languages and we didn’t have the capacity to tweak the search to match. It sometimes seemed slow.

Elastic search solves some of the issues out of the box, you can setup a separate index for each language and language specific stemmers allow better results to be returned for variants of words
wolf, wolves, wolverine

it also allows configuration of what fields are indexed and how search queries are processed.

  • Fuzzy search
  • Exact matching
  • Multi matching (matching across several fields)
  • Stemming
  • Synonyms
  • Shingles (which I have not tried yet)

It also allows the return of Suggestions based on the terms in the index rather than dictionary terms that may not exist in your search terms.

The results seem to be quite fast we have about 2000 products in 4 different languages so about 8000 records.

Next steps

I’m sure we will tweak the settings over time, giving the titles of products and their descriptions different weights, already identifying some synonyms that need adding starwars = star wars for example and spider-man == spider man == spiderman.

An issue we seem to have at the moment is terms that match many items for example currently if you search ‘Marvel legends’ you get 270 items so if you search ‘Marvel Legends Medusa’ the ‘Medusa’ is the most important part but the code can’t readily tell that.

Whilst ‘Marvel Legends Medusa Exclusive’ might come first you also get all the results featuring ‘Marvel legends’ it might actually be better to search for ‘Medusa’ because that is the distinctive part or key phrase.

Probably worth a trial with a common term to see if the results returned featuring lots of other Marvel legends figures are preferred by customers they searched for Medusa but then go one to buy other figures in series or if returning results featuring just Medusa is more useful.

We could probably show expertise by also showing related figures to Medusa like Black Bolt.

Need to read more about key phrase extraction for identifying the Medusa parts of queries, there is a Key phrase extraction algorithm called KEA perhaps that is the way to go. Lots of fiddling ahead :).