Posts Tagged ‘solr’

Things I Learned About Last Week

Posted Tuesday, March 9th, 2010 by Eric Pugh

Last week was the crucial week on my current Lucene -> Solr project for making our goals.  A lot of work the previous couple of weeks came together.  I wanted to take a couple of minutes and just record some of the little things that I’ve been learning about:

Solr

Sunspot is the up and coming solution for integrating Solr into Ruby on Rails, and fortunately enough, the 1.0 release (followed quickly by 1.0.1!) has just come out last week.  Between acts_as_solr and Sunspot, Sunspot wins hands down for it’s support of a master/slave Solr configurations, embedded Solr for testing, richer indexing semantics, and not being tied to ActiveRecord.  The companion sunspot_rails gem does give wonderful ActiveRecord integration however.

Solr cores are the bees knees!  We’ve built a simple RoR webapp using HTTParty and the Solr API that allows you to perform all the admin functions for cores, and allows you to quickly clone a core for your own nefarious purposes!  Simplifies hacking around with a new schema or configuration without having a local copy of Solr running.  Allows multiple QA environments to potentially share a single Solr infrastructure.

Solr master and slave setup in a single VM.  While pointless from a scaling perspective, it’s a really great way to work out the kinks!  It’s funny to see a slave core polling the same Solr VM its in for updated segments!

JRuby

Doesn’t suck after all.  Actually, maybe I should say that JBoss, when combined with JRuby, means that JBoss doesn’t suck so much.  I had the aforementioned Solr core admin tool bundled up as a WAR file with JRuby, and was able to deploy it to an existing environment that had JBoss installed!  I didn’t have to install ruby on the box, (or JRuby for that matter!)  I just deployed the WAR file and bamn, off to the races.  Ops folks get the JBoss they love, I get the Ruby on Rails that I love.

And on a related note, Warbler was the key to thinking JRuby is cool.  I’d never actually had to package up a RoR app, so Warbler came to the rescue.  And you know what?  It was nice to build a single file that I knew had everything that I needed in it that could be scp’ed around!  And thanks to some cool code in the environment.rb, my app was able to load up the right configuration file for the environment based on an environmental variable set in JBoss.

Virtual Machines

I recently migrated a Linux VPS based RoR + Solr app (see a trend in tech choices ;-) )  to a Windows environment.  And to deliever the new Windows environment, I used VirtualBox to host the Windows Vista environment on my Mac laptop.

A couple of notes:

  • VirtualBox may not have all the snazzy integration points of Parallels with the host computer like seamless application sharing, but it seems to be much lighter weight.  Starts up quicker, and I don’t get the spinning beach ball of death as much.
  • If you are shipping a 11 GB file, you can’t use a 16 GB USB Memory Stick…  Turns out the biggest file is 4 GB.  (Although I never tried formatting the stick as NTFS, maybe that would have allowed a single 11 GB file???)
  • Uploading 11 GB to a remote out on the internet server will take a long long long time.  Even on a really fast network. connection.
  • If you need to format an external USB hard drive as NTFS on a Mac, it is possible!  Just fire up your trusty Windows Vista image in Parallels, plug the USB drive in, download and install the correct USB drivers so the drive doesn’t show up as a network share mapped to the Mac, and then use the built in reformatting tools!  Warning: This will take a loooong time!
  • Lastly, if you are using VirtualBox, and you attempt to create a Windows XP machine, and attach a Windows Vista hard disk image to it, VirtualBox will let you!  And then Windows won’t start.  sigh.

Eric Pugh to speak on Solr at Shenandoah Ruby Users Group October 27th

Posted Tuesday, October 20th, 2009 by Eric Pugh

From the Meetup site:

We’ll look at the thriving Ruby ecosystem that has grown up around integrating with Solr. From Ruby gems that integrate with Solr like solrb and rsolr, to general search solutions like acts_as_solr and sunspot. We’ll also look at a complete “shrink wrapped” catalog solution for Solr using BlacklightOPAC.

You’ll lean the basics of getting started with Solr, and an understanding of what Ruby solutions are available to simplifying adding great search to your site!

As usual, food and beverages will be provided.

Solr 1.4 Enterprise Search Server Book is Released!

Posted Wednesday, August 19th, 2009 by Eric Pugh

Solr 1.4 Enterprise Search Server Book Cover

I am very proud to annouce the first book on Solr has been published by Packt. This has been a labor of love for myself and my co-author David Smiley, and we are excited to see the book now “in the wild!”. Below is a copy of the email sent to the Solr community:

Fellow Solr users,

I’ve finally finished the book “Solr 1.4 Enterprise Search Server” with my co-author Eric. We are proud to present the first book on Solr and hope you find it a valuable resource. You can find full details about the book and purchase it here:
http://www.packtpub.com/solr-1-4-enterprise-search-server/book
It can be pre-ordered at a discount now and should be shipping within a week or two. The book is also available through Amazon. You can feel good about the purchase knowing that 5% of each sale goes to support the Apache Software Foundation. For a free sample, there is a portion of chapter 5 covering faceting available as an article online here:
http://www.packtpub.com/article/faceting-in-solr-1.4-enterprise-search-server

By the way, we realize Solr 1.4 isn’t out [quite] yet. It is feature-frozen however, and there’s little in the forthcoming release that isn’t covered in our book. About the only notable thing that comes to mind is the contrib module on search result clustering. However Eric plans to write a free online article available from Packt Publishing on that very subject.

“Solr 1.4 Enterprise Search Server” In Detail:

If you are a developer building a high-traffic web site, you need to have a terrific search engine. Sites like Netflix.com and Zappos.com employ Solr, an open source enterprise search server, which uses and extends the Lucene search library. This is the first book in the market on Solr and it will show you how to optimize your web site for high volume web traffic with full-text search capabilities along with loads of customization options. So, let your users gain a terrific search experience

This book is a comprehensive reference guide for every feature Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate it with other languages and frameworks

This book first gives you a quick overview of Solr, and then gradually takes you from basic to advanced features that enhance your search. It starts off by discussing Solr and helping you understand how it fits into your architecture—where all databases and document/web crawlers fall short, and Solr shines. The main part of the book is a thorough exploration of nearly every feature that Solr offers. To keep this interesting and realistic, we use a large open source set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project. Using this data as a testing ground for Solr, you will learn how to import this data in various ways from CSV to XML to database access. You will then learn how to search this data in a myriad of ways, including Solr’s rich query syntax, “boosting” match scores based on record data and other means, about searching across multiple fields with different boosts, getting facets on the results, auto-complete user queries, spell-correcting searches, highlighting queried text in search results, and so on.

After this thorough tour, we’ll demonstrate working examples of integrating a variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, XSLT, PHP, and Python.

Finally, we’ll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site

Sincerely,

David Smiley (primary-author)
dsmiley@mitre.org
Eric Pugh (co-author)
epugh@opensourceconnections.com

A huge round of thanks goes to David for bringing me into this project and being such a great partner on it! With 5% of the proceeds going to the Apache Software Foundation, here’s hoping it’s a great success!