Solr Index Speed on EBS

If you’ve got your Solr index on an Amazon EBS volume, save yourself some headache and do this every time you make a new volume:

sudo nohup dd if=/dev/xvdi of=/dev/null &

(Use your own volume in place of xvdi.)

That just writes the whole volume to /dev/null. Seems kind of dumb on the face of it, but the Amazon docs on EBS performance say there is a 5% to 50% reduction in IOPS when you first access data on a volume. I don’t know what magic happens in Amazon’s datacenter, but the solution is to read every block on the volume.

That’s all you have to know. If you want the backstory, read on…

We found this out the hard way when trying to pin down performance variances on new installations. Our thinking was that in order to take advantage of AutoScaling we’d want our index baked into an AMI so that we can have added query capacity in about 10 minutes (that’s about how long it takes to spin up an instance off of an AMI). If instead we opted for instance (ephemeral) storage, we’d have to wait for replication, which takes about three hours with our current index.

So this all worked well except when we went to test performance. The weird thing was, we got wildly different performance results every time we created a new stack! A while ago I saw a great ops presentation (I forget who) at LuceneRevolution that talked about preemptively cat’ing the index to /dev/null to prime the OS disk cache. Those keywords helped me find that EBS performance page. After doing the above dd (I think it stands for “disk duplicate”) our performance was much more predictable.

It still takes quite a bit of time to read every block on our EBS volumes. That means new instances in our AutoScaling Group will have degraded performance for a while. One thing I might try later is to have multiple processes reading from various parts of the volume in parallel.


post-type:post

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>