Author Archive for ‘ Eric Pugh’

Be Effective When Working In a Really Cool/Fun/Distracting Place

Posted Wednesday, July 7th, 2010 by Eric Pugh

Where are Eric and Youssef?

This year two OSC folks are working remotely for extended periods of time. Youssef spent 6 weeks in Lebanon earlier this year, and I am spending the month of July in the mountains of Western North Carolina. We had talked about some of the rewards as well as challenges of working effectively when you are in a new/fun/exciting/[insert superlative here] place. Youssef and I came up with a few tips to make balancing work/play easier!

Tip 1: Establish a Routine

Youssef says: There will be no shortage of distractions, specially during the World Cup season. A trip to the mountains, a music festival or a friend’s unannounced (or announced) visit are all good reasons to drop what you’re doing and enjoy the place you are in. The only way to both get your work done and have fun is to know when is the appropriate time for each. I found that establishing a routine worked best for me. I woke up early every morning and took advantage of the early quiet hours of the day. Started work at 8am and got plenty done early on. Then I gave myself a couple of hours of break time for unscheduled events (a friend’s visit, a trip to the store to get souvenirs, or even a couple of hours at the beach). Then I resumed work till the evening when I had a conference call scheduled every day at 5:45pm. After the call I was free to do whatever I wanted and that gave me plenty of time, specially in a city like Beirut that does not sleep.
Eric says: I know that I sometimes struggle to get started working in the morning. And being in the office makes it simpler because everyone else is working. So having a specific schedule of when I am working helps me shift my brain out of family/vacation mode and into work mode!
(more…)

When talking time zones: Bogota != Eastern Time (US & Canada)!

Posted Wednesday, April 21st, 2010 by Eric Pugh

I’ve been using the timezone localization technique of asking the browser when the page loads what the browsers timezone offset from UTC is, and posting that back to the server and storing it in the session.  However recently I noticed that with the event of Daylight Savings Time, this was no longer working, because my time would come up an hour off here in Virginia.

After much faffing about, I finally figured it out.  On the server I would ask for the set of timezones that matched the offset, and grab the first one and put that in the session:

[sourcecode]
result = ActiveSupport::TimeZone.all.select{|t|t.utc_offset == gmtoffset}.first
session[:time_zone] = result.name
[/sourcecode]

The list of named time zones returned when the browser is in Charlottesville, Virginia are: Bogota, Eastern Time (US & Canada), Indiana (East), Lima, Quito.

So when I use Bogota as the timezone, and ask Rails to show the time localized:

[ruby]
Time.now.in_time_zone(session[:time_zone])
[/ruby]

I get back the time without taking into account daylight savings wrong. I started trying to figure out if the browser was in a DST zone using this JavaScript code: http://www.michaelapproved.com/articles/daylight-saving-time-dst-detect/ and while it seems very promising, it still wasn’t quite giving me what I want.

Finally, I realized it…. By arbitrarily grabbing the first time zone in the list, I was showing time in Bogota, Columbia. But if I chose Eastern Time (US & Canada) then I do get a localized time that takes into account day light savings!

So right now I have this method:

[ruby]
result = ActiveSupport::TimeZone.all.select{|t| t.utc_offset == gmtoffset && t.name.include?("US")}.first
session[:time_zone] = result.name
[/ruby]

Obviously this is pretty hardcoded to just work in the US, and isn’t a real solution. I’d love to hear other ideas! Part of me wonders if I should just display all times in UTC in HTML, and have some sort of client side JavaScript that localizes the time display?

My full set of code:
Javascript in my index.html.erb view:
[javascript]
// Calls the server and sets the user’s time.
Event.observe(window, ‘load’, function(e) {
var now = new Date();
var gmtoffset = TimezoneDetect();
//use ajax to set the time zone here.
var set_time = new Ajax.Request(‘<%=url_for :controller => "home", :action => "gmtoffset"%>?gmtoffset=’+gmtoffset, {
onSuccess: function(transport) {
//alert("Response" + transport.responseText);
}
});
});

// http://www.michaelapproved.com/articles/daylight-saving-time-dst-detect/

function TimezoneDetect(){
var dtDate = new Date(’1/1/’ + (new Date()).getUTCFullYear());
var intOffset = 10000; //set initial offset high so it is adjusted on the first attempt
var intMonth;
var intHoursUtc;
var intHours;
var intDaysMultiplyBy;

//go through each month to find the lowest offset to account for DST
for (intMonth=0;intMonth < 12;intMonth++){
//go to the next month
dtDate.setUTCMonth(dtDate.getUTCMonth() + 1);

//To ignore daylight saving time look for the lowest offset.
//Since, during DST, the clock moves forward, it’ll be a bigger number.
if (intOffset > (dtDate.getTimezoneOffset() * (-1))){
intOffset = (dtDate.getTimezoneOffset() * (-1));
}
}

return intOffset;
}

[/javascript]

home_controller.rb action:
[ruby]
def gmtoffset
gmtoffset = params[:gmtoffset].to_i*60 if !params[:gmtoffset].nil? # notice that the javascript version of gmtoffset is in minutes ;-)

result = ActiveSupport::TimeZone.all.select{|t| t.utc_offset == gmtoffset && t.name.include?("US")}.first
session[:time_zone] = result.name

render :update do |page|
page.replace_html ‘time_of_chat_starting’, :partial=> ‘super_short_time’
page.visual_effect :highlight, ‘time_of_chat_starting’
end
end
[/ruby]

Rendered partial helper view _super_short_time.erb:
[ruby]
<%= super_short_time(Time.now.in_time_zone(session[:time_zone])) %>
[/ruby]

beCamp 2010 is April 30 & May 1st

Posted Wednesday, March 31st, 2010 by Eric Pugh

becamp-badge-300w-white-2010beCamp 2010 is almost here! April 30th and May 1st are just four weeks away!

If you’re a geek in or around the Charlottesville metroplex or even if you’re merely tech-curious, this is the event you don’t want to miss. beCamp is Charlottesville’s version of the BarCamp unconference phenomenon—organized on the fly by attendees, for attendees. Realizing that the most energizing parts of any tech conference are the ad hoc conversations that take place in the hallways between the sessions, beCamp facilitates these types of interactions for an entire event.

As of this writing, we are at 87 campers! To participate, just add your name to the wiki page!

A big thank you to all our sponsors, including at this point,  Hotelicopter, Google, Perrin Quarles and Associates, NRAO, and University of Virginia ITC.  Interested in supporting the Cville tech community?  Check out our needs at http://barcamp.org/sponsor-beCamp-2010.

Things I learned Last Week Part 2

Posted Thursday, March 18th, 2010 by Eric Pugh

Favicon and Rails?

Want to use a favicon.ico but don’t want to put it at the root as favicon.ico?  Then add <LINK REL=”SHORTCUT ICON” href=”<%=image_path(‘limelight.ico’)%>”>.   The use of image_path means that you get all the goodness of Rails routing to generate a complete image path that will work.  Even if you deploy under some sub URL, like a war file in JBoss!

Are you working with ISO-8859-1 encoded text, (more info at  at http://www.w3schools.com/tags/ref_entities.asp).

ISO 8859-1 Characters

And look at the character À  (should be a capital A with a caret on top called a grave accent).  That will kill Solr, regardless of container deployed in, like Jetty versus Tomcat.  But the other ways of representing this entity work great:

&#192; &Agrave;

Of course, there is a bit of confusion on this, as supposedly if the XML document posted to Solr is UTF-8 encoded, then Solr shouldn’t have any issues.  So, still some digging to do!

Solritas and JBoss and Velocity Oh My

I recently ran into a Java Classloader issue between JBoss and Solr when loading Velocity. If you are getting in the browser:

TTP Status 500 – loader constraint violation: when resolving method “org.apache.velocity.Template.merge

or messages like “SEVERE: java.lang.LinkageError: loader constraint violation: when resolving method “org.apache.velocity.Template.merge” in the logs, then that means a conflict in the velocity jars.  Oddly enough, I could not actually find a velocity.jar anywhere in my JBoss app.  However, the fix was to copy the velocity jar from Solr into my JBoss ./lib/ directory.

Things I Learned About Last Week

Posted Tuesday, March 9th, 2010 by Eric Pugh

Last week was the crucial week on my current Lucene -> Solr project for making our goals.  A lot of work the previous couple of weeks came together.  I wanted to take a couple of minutes and just record some of the little things that I’ve been learning about:

Solr

Sunspot is the up and coming solution for integrating Solr into Ruby on Rails, and fortunately enough, the 1.0 release (followed quickly by 1.0.1!) has just come out last week.  Between acts_as_solr and Sunspot, Sunspot wins hands down for it’s support of a master/slave Solr configurations, embedded Solr for testing, richer indexing semantics, and not being tied to ActiveRecord.  The companion sunspot_rails gem does give wonderful ActiveRecord integration however.

Solr cores are the bees knees!  We’ve built a simple RoR webapp using HTTParty and the Solr API that allows you to perform all the admin functions for cores, and allows you to quickly clone a core for your own nefarious purposes!  Simplifies hacking around with a new schema or configuration without having a local copy of Solr running.  Allows multiple QA environments to potentially share a single Solr infrastructure.

Solr master and slave setup in a single VM.  While pointless from a scaling perspective, it’s a really great way to work out the kinks!  It’s funny to see a slave core polling the same Solr VM its in for updated segments!

JRuby

Doesn’t suck after all.  Actually, maybe I should say that JBoss, when combined with JRuby, means that JBoss doesn’t suck so much.  I had the aforementioned Solr core admin tool bundled up as a WAR file with JRuby, and was able to deploy it to an existing environment that had JBoss installed!  I didn’t have to install ruby on the box, (or JRuby for that matter!)  I just deployed the WAR file and bamn, off to the races.  Ops folks get the JBoss they love, I get the Ruby on Rails that I love.

And on a related note, Warbler was the key to thinking JRuby is cool.  I’d never actually had to package up a RoR app, so Warbler came to the rescue.  And you know what?  It was nice to build a single file that I knew had everything that I needed in it that could be scp’ed around!  And thanks to some cool code in the environment.rb, my app was able to load up the right configuration file for the environment based on an environmental variable set in JBoss.

Virtual Machines

I recently migrated a Linux VPS based RoR + Solr app (see a trend in tech choices ;-) )  to a Windows environment.  And to deliever the new Windows environment, I used VirtualBox to host the Windows Vista environment on my Mac laptop.

A couple of notes:

  • VirtualBox may not have all the snazzy integration points of Parallels with the host computer like seamless application sharing, but it seems to be much lighter weight.  Starts up quicker, and I don’t get the spinning beach ball of death as much.
  • If you are shipping a 11 GB file, you can’t use a 16 GB USB Memory Stick…  Turns out the biggest file is 4 GB.  (Although I never tried formatting the stick as NTFS, maybe that would have allowed a single 11 GB file???)
  • Uploading 11 GB to a remote out on the internet server will take a long long long time.  Even on a really fast network. connection.
  • If you need to format an external USB hard drive as NTFS on a Mac, it is possible!  Just fire up your trusty Windows Vista image in Parallels, plug the USB drive in, download and install the correct USB drivers so the drive doesn’t show up as a network share mapped to the Mac, and then use the built in reformatting tools!  Warning: This will take a loooong time!
  • Lastly, if you are using VirtualBox, and you attempt to create a Windows XP machine, and attach a Windows Vista hard disk image to it, VirtualBox will let you!  And then Windows won’t start.  sigh.

Notes from using LucidWorks for Solr Distro

Posted Thursday, January 28th, 2010 by Eric Pugh

I’ve been playing with the LucidWorks for Solr distribution of Solr 1.4, and wanted to share some of things I had noticed about it. The LucidWorks distro is Solr 1.4 with patches and enhancements from Lucid added in.

Installer

The first thing you’ll notice is that an installer (and uninstaller) is provided that walks you through the basic steps of installing Solr. Now Solr itself is pretty darn simple to work with already, but you do need to compile the code, which means you need Ant to be installed. The Lucid installer avoids that need, and  adds support for running Solr in Tomcat as well as Jetty. And, assuming you have a support agreement with Lucid, it supports downloading plugins from Lucid to extend your Solr platform. Right now the only free plugin is the Reference Guide PDF. Having an installer available definitely checks a box for the systems type folks who may be installing Solr, but it doesn’t really do anything crazy special. Also, one nit is that if you install into /opt/dirA, and then want to install into /opt/dirB, you have to delete ~/.LucidWorks/ directory as the install dir is cached!  But it does demonstrate what might be coming from Lucid in future updates!

Installer Targets Screen

Installer Targets Screen

Another enhancment from Lucid is a Tray Application for managing your Solr instances. However, this turns out to just be a basic (on OSX at least!) menubar application that allows you to start/stop a local Solr server. There doesn’t seem to be any options to stop and start remote servers, or monitor the health of running Solrs, so I think this is something you use once and never again! Hey Lucid, it would be great though if the Tray App integrated stoplight monitoring of Solr instances and popped open web pages to admin pages to perform various tasks on your collection of Solr servers!

Directory Layout

The directory that you’ve installed Solr into should look very familiar. In fact, too familiar to me! I’ve gone back and forth on the way that Solr is distributed with source code as well as compiled jars. While Solr used to be a tool that only Java centric shops would look at, it’s now gone mainstream, to where many, if not most, organizations that use Solr are not traditional Java shops! I really wish I could download a version of Solr that didn’t have the src directory, was just a stripped down ready to go application. Admittedly, the example application that is part of the source functions as a template, but it has been bemoaned by myself and others that folks just use and abuse the configuration of what was meant as an example app, to their detriment!

So I was hoping that the LucidWorks distros’ Installer would function as that smart template by walking me through including/excluding various extensions like DIH, Clustering, and Extraction. But at least in this first version, no such luck. The support though for for picking either Tomcat or Jetty as a container shows what could be in the offing though!

While the LucidWorks distro still ships with the hoary old example directory is still there, there is now a lucidworks directory. When you run the new toplevel start.sh shell script it starts Solr with solr.solr.home=lucidworks/solr directory. Something to note is that the start.sh has complete paths defined in it from the installer:

[sourcecode language="text"]
cd /Users/epugh/solr/solr2/LucidWorks/lucidworks/jetty/../
[/sourcecode]

It really should at least have a single variable at the top that you can changing depending on what environment you are in.

The lucidworks project is also setup as a single index project.  Since the future is multicore configurations, I’d like to see that as the default in more examples.  (The example app needs a bit of work as well to better show off multicore as a first class feature!)

solrconfig.xml

Doing a diff on the example and lucidworks versions of solrconfig.xml shows its pretty much the same as the one from the example app, but with the correct configurations for DataImportHandler and the Velocity based search UI called Solritas. Solritas is a nice tool for helping you “wedge” Solr into places by providing a simple Velocity template based translation layer, and even build a GUI, within your Solr environment. Solritas hasn’t received a lot of buzz, so it’s nice seeing it turned on by default! The clustering functionality is also specified, but not sure if the solr.cluster.enabled=true startup parameter is actually required or not.

The other oddity is that the Lucid monitoring product for Solr, SolrGaze, isn’t enabled by default! Doesn’t seem like the most ringing endorsement for the software. I’m excited by the prospect of better visiblity into the internals of Solr, so I enabled it.

schema.xml

Diffing the two schema.xml files reveals the addition of the Lucid KStemmer com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory for fast non-aggresive text stemming. According to Lucid it provides:

Large field performance shows a 220% performance increase, while small fields show a 1140% increase compared to the original UMASS code.

SolrGaze

SolrGaze promises to make it easier to see what is going on inside of Solr. Anything that makes it simpler for operations folks instead of developers to manage Solr is good in my book. I ran into one nit which was I opened up SolrGaze using the url http://localhost:8983/gaze/index.html. It barfed connecting to Solr to display gathered metrics, but if I used http://127.0.0.1:8983/gaze/index.html then everything was fine.

I haven’t had to chance to really play with Gaze yet, so I’ll post a more in-depth review soon.

Summary

All in all, the Lucid distro would be what I would recommend for a first timer to download, or someone doing a spike of development and needing a quick install of Solr.  Not requiring Ant to be installed is a wonderful thing, and being pre-configured for Clustering, DIH, and Solritas means you get to see a working Solr install, complete with a full featured GUI, right out of the box.  In terms of using for a production deploy, there is less to recommend it since you’re going to want to strip down to just the bits and bobs that your require for your specific needs.  I haven’t delved down into what SolrGaze provides, so that feature may be the tipping point for deciding to use the Lucid distribution.

Erik Hatcher, Solr Committer, reviews Solr 1.4 Enterprise Search Server

Posted Monday, January 11th, 2010 by Eric Pugh

When I first got involved in writing Solr 1.4 Enterprise Search Server I knew that one of the folks I wanted to have review the book was Erik Hatcher, a Solr committer, and who introduced me to the project.

He has written a very indepth review, that I’ll admit I was nervous to read! But he summed it up as:

Grand Finale
I spelled out a lot of fiddly feedback above, and I expect the great addendum wiki page will factor in any keepers from this review. Of course most of the review points out mistakes or differences of opinion, that’s what a review is for, though this is a solid, useful book. So, if you’re considering using Solr, this book is for you. If you’re already using Solr, you’ll likely pick up a useful trick or three. Go get it!

As you can see from the level of detail in his post, when we come out with a second version of the Solr book, updating it for changes between when we published it and the final release of Solr 1.4 will be very easy!

Streaming Index Progress Results to Browser

Posted Friday, December 11th, 2009 by Eric Pugh

I recently needed to index from a local filesystem several thousand static webpages into Solr. I was already using Ruby on Rails for the admin interface, so I quickly threw together an action to index the documents using HPricot and RSolr. To monitor the progress I just output to standard out using puts
[code lang="ruby"]
def index_bulk_html
solr = RSolr.connect :url=>SOLR_URL
count = 0
files = Dir.glob("/Users/epugh/Documents/code/www.somesite.com/**/*.{html,htm}")
files.each do |file|
path_ends_at = file.index("www.somesite.com")
unless path_ends_at.nil?
puts("<strong>Processed #{count} of #{files.size}</strong>") if count % 100 == 0

url = "http://#{file[path_ends_at,file.size]}"
title, content = parse_html(file, title, content)

puts "Bad Content:#{!page_content.blank?} #{url} #{title}"

begin
solr.add :id=> url, :url=>url, :mimeType=>"text/html", :title => title, :docText => page_content
solr.commit
count = count + 1
rescue RSolr::RequestError
puts "<strong>Could not index #{file}</strong>"
end
end
end
puts "Imported #{count} webpages successfully."
solr.optimize
redirect_to root_path

end[/code]
This worked great, but I realized that indexing over 10,000 documents takes a long time, and meanwhile the user is staring at the browser slowly loading, wondering if things had frozen or not! So I wondered if I could somehow stream some info back to the user. Fortunately Rails has already solved that problem! ActionController has the ability to render as text a proc object, and stream the output:
[code lang="ruby"] # Renders "Hello from code!"
render :text => proc { |response, output| output.write("Hello from code!") }[/code]

So I quickly wrapped my existing code in a large proc, changed the puts to output.write, and now stream out to the browser constant progress reports:
[code lang="ruby"]
def index_bulk_html
solr = RSolr.connect :url=>SOLR_URL
count = 0
files = Dir.glob("/Users/epugh/Documents/code/www.somesite.com/**/*.{html,htm}")
render :text => proc { |response, output|
files.each do |file|
path_ends_at = file.index("www.somesite.com")
unless path_ends_at.nil?
output.write("<strong>Processed #{count} of #{files.size}</strong>") if count % 100 == 0

url = "http://#{file[path_ends_at,file.size]}"
title, content = parse_html(file, title, content)

output.write "Bad Content:#{!page_content.blank?} #{url} #{title}"
output.flush

begin
solr.add :id=> url, :url=>url, :mimeType=>"text/html", :title => title, :docText => page_content
solr.commit
count = count + 1
rescue RSolr::RequestError
output.write "<strong>Could not index #{file}</strong>"
output.flush
end
end
end
output.write "Imported #{count} webpages successfully."
}
solr.optimize

end
[/code]
Thank you Rails, Hpricot, and RSolr for making life so simple!