Archive for the ‘Programming’ Category

Things I Learned About Last Week

Posted Tuesday, March 9th, 2010 by Eric Pugh

Last week was the crucial week on my current Lucene -> Solr project for making our goals.  A lot of work the previous couple of weeks came together.  I wanted to take a couple of minutes and just record some of the little things that I’ve been learning about:

Solr

Sunspot is the up and coming solution for integrating Solr into Ruby on Rails, and fortunately enough, the 1.0 release (followed quickly by 1.0.1!) has just come out last week.  Between acts_as_solr and Sunspot, Sunspot wins hands down for it’s support of a master/slave Solr configurations, embedded Solr for testing, richer indexing semantics, and not being tied to ActiveRecord.  The companion sunspot_rails gem does give wonderful ActiveRecord integration however.

Solr cores are the bees knees!  We’ve built a simple RoR webapp using HTTParty and the Solr API that allows you to perform all the admin functions for cores, and allows you to quickly clone a core for your own nefarious purposes!  Simplifies hacking around with a new schema or configuration without having a local copy of Solr running.  Allows multiple QA environments to potentially share a single Solr infrastructure.

Solr master and slave setup in a single VM.  While pointless from a scaling perspective, it’s a really great way to work out the kinks!  It’s funny to see a slave core polling the same Solr VM its in for updated segments!

JRuby

Doesn’t suck after all.  Actually, maybe I should say that JBoss, when combined with JRuby, means that JBoss doesn’t suck so much.  I had the aforementioned Solr core admin tool bundled up as a WAR file with JRuby, and was able to deploy it to an existing environment that had JBoss installed!  I didn’t have to install ruby on the box, (or JRuby for that matter!)  I just deployed the WAR file and bamn, off to the races.  Ops folks get the JBoss they love, I get the Ruby on Rails that I love.

And on a related note, Warbler was the key to thinking JRuby is cool.  I’d never actually had to package up a RoR app, so Warbler came to the rescue.  And you know what?  It was nice to build a single file that I knew had everything that I needed in it that could be scp’ed around!  And thanks to some cool code in the environment.rb, my app was able to load up the right configuration file for the environment based on an environmental variable set in JBoss.

Virtual Machines

I recently migrated a Linux VPS based RoR + Solr app (see a trend in tech choices ;-) )  to a Windows environment.  And to deliever the new Windows environment, I used VirtualBox to host the Windows Vista environment on my Mac laptop.

A couple of notes:

  • VirtualBox may not have all the snazzy integration points of Parallels with the host computer like seamless application sharing, but it seems to be much lighter weight.  Starts up quicker, and I don’t get the spinning beach ball of death as much.
  • If you are shipping a 11 GB file, you can’t use a 16 GB USB Memory Stick…  Turns out the biggest file is 4 GB.  (Although I never tried formatting the stick as NTFS, maybe that would have allowed a single 11 GB file???)
  • Uploading 11 GB to a remote out on the internet server will take a long long long time.  Even on a really fast network. connection.
  • If you need to format an external USB hard drive as NTFS on a Mac, it is possible!  Just fire up your trusty Windows Vista image in Parallels, plug the USB drive in, download and install the correct USB drivers so the drive doesn’t show up as a network share mapped to the Mac, and then use the built in reformatting tools!  Warning: This will take a loooong time!
  • Lastly, if you are using VirtualBox, and you attempt to create a Windows XP machine, and attach a Windows Vista hard disk image to it, VirtualBox will let you!  And then Windows won’t start.  sigh.

Microsoft Abandons FAST On Linux and Unix and Opens the Door For Solr

Posted Monday, February 8th, 2010 by Jason Hull

Today, Microsoft announced that it was abandoning development of the FAST search engine for Linux and Unix.  Given that Microsoft paid $1.2 billion for FAST, the move is an apparent revelation of a strategy to get non-Windows based users to move to an enterprise Windows platform rather than to continue to support FAST.

This move seems to be risky.  The Microsoft bet is that its FAST customers are more loyal to FAST than they are the operating platform, and the perception of switching costs are higher for moving from FAST to another enterprise search engine rather than the opposite–a loyalty to the operating system and a perception that search engines are interchangeable.

Microsoft might be right for most of its customers, but this announcement will certainly be grist for the mill in IT departments over the coming weeks.  Many companies built their IT infrastructure around a Linux-based platform, and being forced to change to a Windows environment may be a pill that is too hard to swallow.  The alternative will be to look to other search engines, which can do nothing but help Solr and Lucene.  With an established user base, enterprise grade support packages from companies like Lucid Imagination, and a significantly lower total cost of ownership than the FAST + Windows package, Solr will appeal to many a CTO who might otherwise have continued to gladly pay the licensing costs for FAST but is now forced to reconsider his or her decision.

Rather than supporting FAST on both platforms at the cost of a few developers, Microsoft may lose many more customers and revenues because of its insistence on one platform.  It will be interesting to see how companies like Lucid respond to the new opportunity.

Notes from using LucidWorks for Solr Distro

Posted Thursday, January 28th, 2010 by Eric Pugh

I’ve been playing with the LucidWorks for Solr distribution of Solr 1.4, and wanted to share some of things I had noticed about it. The LucidWorks distro is Solr 1.4 with patches and enhancements from Lucid added in.

Installer

The first thing you’ll notice is that an installer (and uninstaller) is provided that walks you through the basic steps of installing Solr. Now Solr itself is pretty darn simple to work with already, but you do need to compile the code, which means you need Ant to be installed. The Lucid installer avoids that need, and  adds support for running Solr in Tomcat as well as Jetty. And, assuming you have a support agreement with Lucid, it supports downloading plugins from Lucid to extend your Solr platform. Right now the only free plugin is the Reference Guide PDF. Having an installer available definitely checks a box for the systems type folks who may be installing Solr, but it doesn’t really do anything crazy special. Also, one nit is that if you install into /opt/dirA, and then want to install into /opt/dirB, you have to delete ~/.LucidWorks/ directory as the install dir is cached!  But it does demonstrate what might be coming from Lucid in future updates!

Installer Targets Screen

Installer Targets Screen

Another enhancment from Lucid is a Tray Application for managing your Solr instances. However, this turns out to just be a basic (on OSX at least!) menubar application that allows you to start/stop a local Solr server. There doesn’t seem to be any options to stop and start remote servers, or monitor the health of running Solrs, so I think this is something you use once and never again! Hey Lucid, it would be great though if the Tray App integrated stoplight monitoring of Solr instances and popped open web pages to admin pages to perform various tasks on your collection of Solr servers!

Directory Layout

The directory that you’ve installed Solr into should look very familiar. In fact, too familiar to me! I’ve gone back and forth on the way that Solr is distributed with source code as well as compiled jars. While Solr used to be a tool that only Java centric shops would look at, it’s now gone mainstream, to where many, if not most, organizations that use Solr are not traditional Java shops! I really wish I could download a version of Solr that didn’t have the src directory, was just a stripped down ready to go application. Admittedly, the example application that is part of the source functions as a template, but it has been bemoaned by myself and others that folks just use and abuse the configuration of what was meant as an example app, to their detriment!

So I was hoping that the LucidWorks distros’ Installer would function as that smart template by walking me through including/excluding various extensions like DIH, Clustering, and Extraction. But at least in this first version, no such luck. The support though for for picking either Tomcat or Jetty as a container shows what could be in the offing though!

While the LucidWorks distro still ships with the hoary old example directory is still there, there is now a lucidworks directory. When you run the new toplevel start.sh shell script it starts Solr with solr.solr.home=lucidworks/solr directory. Something to note is that the start.sh has complete paths defined in it from the installer:

cd /Users/epugh/solr/solr2/LucidWorks/lucidworks/jetty/../

It really should at least have a single variable at the top that you can changing depending on what environment you are in.

The lucidworks project is also setup as a single index project.  Since the future is multicore configurations, I’d like to see that as the default in more examples.  (The example app needs a bit of work as well to better show off multicore as a first class feature!)

solrconfig.xml

Doing a diff on the example and lucidworks versions of solrconfig.xml shows its pretty much the same as the one from the example app, but with the correct configurations for DataImportHandler and the Velocity based search UI called Solritas. Solritas is a nice tool for helping you “wedge” Solr into places by providing a simple Velocity template based translation layer, and even build a GUI, within your Solr environment. Solritas hasn’t received a lot of buzz, so it’s nice seeing it turned on by default! The clustering functionality is also specified, but not sure if the solr.cluster.enabled=true startup parameter is actually required or not.

The other oddity is that the Lucid monitoring product for Solr, SolrGaze, isn’t enabled by default! Doesn’t seem like the most ringing endorsement for the software. I’m excited by the prospect of better visiblity into the internals of Solr, so I enabled it.

schema.xml

Diffing the two schema.xml files reveals the addition of the Lucid KStemmer com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory for fast non-aggresive text stemming. According to Lucid it provides:

Large field performance shows a 220% performance increase, while small fields show a 1140% increase compared to the original UMASS code.

SolrGaze

SolrGaze promises to make it easier to see what is going on inside of Solr. Anything that makes it simpler for operations folks instead of developers to manage Solr is good in my book. I ran into one nit which was I opened up SolrGaze using the url http://localhost:8983/gaze/index.html. It barfed connecting to Solr to display gathered metrics, but if I used http://127.0.0.1:8983/gaze/index.html then everything was fine.

I haven’t had to chance to really play with Gaze yet, so I’ll post a more in-depth review soon.

Summary

All in all, the Lucid distro would be what I would recommend for a first timer to download, or someone doing a spike of development and needing a quick install of Solr.  Not requiring Ant to be installed is a wonderful thing, and being pre-configured for Clustering, DIH, and Solritas means you get to see a working Solr install, complete with a full featured GUI, right out of the box.  In terms of using for a production deploy, there is less to recommend it since you’re going to want to strip down to just the bits and bobs that your require for your specific needs.  I haven’t delved down into what SolrGaze provides, so that feature may be the tipping point for deciding to use the Lucid distribution.

Recap of First Two Weeks

Posted Thursday, January 14th, 2010 by Youssef Chaker

Recap of First Two Weeks post is out: http://whichrubycmsshouldiuse.com/2010/01/12/recap-of-first-two-weeks.html

Erik Hatcher, Solr Committer, reviews Solr 1.4 Enterprise Search Server

Posted Monday, January 11th, 2010 by Eric Pugh

When I first got involved in writing Solr 1.4 Enterprise Search Server I knew that one of the folks I wanted to have review the book was Erik Hatcher, a Solr committer, and who introduced me to the project.

He has written a very indepth review, that I’ll admit I was nervous to read! But he summed it up as:

Grand Finale
I spelled out a lot of fiddly feedback above, and I expect the great addendum wiki page will factor in any keepers from this review. Of course most of the review points out mistakes or differences of opinion, that’s what a review is for, though this is a solid, useful book. So, if you’re considering using Solr, this book is for you. If you’re already using Solr, you’ll likely pick up a useful trick or three. Go get it!

As you can see from the level of detail in his post, when we come out with a second version of the Solr book, updating it for changes between when we published it and the final release of Solr 1.4 will be very easy!

Day Two with adva-cms

Posted Friday, January 8th, 2010 by Youssef Chaker

Day Two with adva-cms is out:  http://whichrubycmsshouldiuse.com/2010/01/08/day-two-with-adva-cms.html

Day One with adva-cms

Posted Thursday, January 7th, 2010 by Youssef Chaker

Day One with adva-cms post is now available:  http://whichrubycmsshouldiuse.com/2010/01/07/day-one-with-adva-cms.html

Day Three with Radiant

Posted Wednesday, January 6th, 2010 by Youssef Chaker

Day Three with Radiant post is out:  http://whichrubycmsshouldiuse.com/2010/01/06/day-three-with-radiant.html