Modifying Solr Result Relevancy via an "Auxiliary Boost" Field

April 10, 2013 John Berryman
Category: Uncategorized

English is a confusing language. I mean, does it really make sense that you can park in a driveway or drive in a parkway? Also, Ive always been amused that there actually exists a class of words that are their own antonym – so called “auto-antonyms”:

cleave – 1] Split or sever (something) 2] Stick fast to

awful – 1] worthy of awe 2] very bad

to overlook – 1] to inspect 2] to fail to notice

Unfortunately, the confusing nature of English (and of all natural languages) sometimes has consequences that can affect our bottom line. Consider a situation that Zappos! was facing sometime back with their search results: If I am looking for a pair of “dress shoes”, then what should I expect to see?

You would expect that I would see a page full of brown or black leather shoes right? Unfortunately Solr had some different opinions. By default, the page was filled not only with dress shoes, but with sundresses, and tennis shoes, and with dress pants! And in some ways it makes sense, right? Under the hood, Solr is really just a sophisticated and performant token matching engine.

Fortunately for Zappos, much of their problem was alleviated by boosting higher on phrase matches. So that if “dress” and “shoes” occurred next to each other in text, then that document would rise toward the top. However, some e-commerce sites have a great deal of difficulty with this problem and it drives them toward extreme and even somewhat detrimental approaches. For instance, some companies build in special case solutions – bandaid solutions – so that if they see a particular query string then they completely circumvent their search engine and provide a hand tailored set of results. This is a very brittle approach because with every update to the inventory, with every new partnership, and with every new advertising campaign, someone must review each of these bandaid fixes and make sure they are still relevant.

Theres a better approach, beautiful in its simplicity and its flexibility. Solr, and ElasticSearch view each item in your inventory as a document which has various fields which correspondingly have their own values. So for Zappos, a document might contain a SKU, an item name, a brand name, a description, and a price. But theres no reason that you cant include additional fields that are used to modify the relevancy of a particular document in a particular search. We call these fields auxiliary boosting fields and they work like this: Consider again the dress shoes problem. If every document in your index has two additional fields, AuxiliaryBoost and AuxiliaryBust, then we can tightly control the search results and the way they are sorted. As a merchandizing expert, if you see a document that should not appear in the search results, a sundress for example, then you add the offending query string to the AuxiliaryBust field. Accordingly, if you find a document that really should be sorted higher in the result set, then you add the query string to the AuxiliaryBoost field. The final piece of this puzzle is a slight modification that you make to the actual query that goes to Solr. To get rid of all bad results you add a filter query to remove those documents that have a match in the AuxiliaryBust field:

fq=-AuxiliaryBust:(dress AND shoes)

To promote those documents that really deserve to be at the top, you simply add the AuxiliaryBoost field to the set of fields that youre searching over and apply appropriate boosting.

qf=SKU^10 ItemName^5 ItemDescription^3 Brand^4   AuxiliaryBoost^1pf=ItemDescription^3   AuxiliaryBoost^2

Now, if youre a merchandizing expert reading this, youre probably becoming upset again at this point because you have no easy way of adding fields or of modifying the text they contain. Furthermore, if you have to adjust boosting of particular fields, your hands are equally tied. We have recognized this issue over and over again and as a result we are in the process of building SolrPanl – a merchandizer-facing search behavior dashboard. As a merchandizer, SolrPanl will allow you to create a test case of “troubled searches” to monitor and modify. If you see a search that has particularly bad results then you will be able adjust the boosting of various fields with a simple UI composed of sliders and selection boxes. As you modify these parameters, you can see immediately how the search results are effected. (In the past, you would have to tell your tech team to make a modification and then check back later to see the results.) If you find that a document appears lower in a particular search result set than it should, then we will provide you the tools to understand why that is happening. Finally, you will also be able to modify the documents directly by adding query strings to fields such as AuxiliaryBoost and AuxiliaryBust. You can even do simple things such as fixing typos!

If youre interested, then please follow our ongoing development of SolrPanl here. Also, ask us about becoming a beta tester!

Check out my LinkedIn Follow me on Twitter

Modifying Solr Result Relevancy via an “Auxiliary Boost” Field