Search Engine Testing and Apache Lucene Eurocon Oh My!

July 27, 2011 OpenSource Connections
Category: Conference

Eric Pugh got notified today that hes been accepted to speak on “Better Search Engine Testing” at the 2011 Apache Lucene Eurocon.

So, what, you may ask, does search engine testing entail?

Great search implementation isn’t just about installing Solr and making it run a millisecond faster. It’s about setting up the conditions to ensure that the results which the search engine returns off of a query are the results which the search engine should return off of a query. How do you know what search results the engine should return? This is where your customer has to be integrally involved. Testing search results has similarities and differences to other areas of enterprise testing. Some areas of enterprise testing don’t require customer input, for example, how much concurrent load can a site handle before it fails? Others, though, do, such as what is the expected result when I click on a given link. In those instances, once client input is provided, then testing toolkits such as Selenium exist to automate testing. A similar process occurs with search testing. Content owners will have the best idea of what results should come up on a given query, and once you have that list, then you can automate the tests to measure adherence of the actual search engine results pages with the expected results.

Testing, though, is never a complete process. Aside from the most static of sites, content changes. Usually, changing content reflects changing business priorities, and as those priorities and the resulting content change, so too should the test results. Additionally, over time, important queries may change. What was important a year ago may no longer be as critical or as profitable. As a result, content owners should constantly revisit both the list of important queries as well as the results that those queries should return to get a feel for how the search engine is performing against expectations. It may require tuning, but only through constant testing will the content owner know that tuning is required.

What about the long tail? some will ask. To answer the question of the long tail, the customer must first answer the question of where the money comes from. Does it truly come from the long tail, as the case with eBay or Amazon, or does a bulk of profit come from a few queries? Remember, you can tune for the long tail, but it cannot come at the expense of other, more profitable queries. Just as a mathematician can derive a formula to match any curve but often cannot predict the next point, it is possible to overtune a search engine to match the long tail but miss the next big revenue opportunity.

What do you think? Am I close? Would you sign up to hear more about the session? Talk to us and let us know if this is what you see with your search engine implementation.