Author Archive for ‘ Eric Pugh’

James Bach, the bad boy of Testing?

Posted Monday, October 26th, 2009 by Eric Pugh

So, is James Bach (@jamesmarcusbach) the bad boy of testing?

I flew up to Boston on Monday to lead some workshops on Continuous Integration. I checked into my room at the Hyatt and then went downstairs to see who was around. I ran into a couple of speakers milling about, and eventually joined one of them, and we headed over to the MIT Press bookstore, me to look for my Solr book. I wasn’t too sure of the name of the other speaker I was with (I asked once, but couldn’t quite remember what it was…). So we got to the book shop, I ask my fellow speaker again: James Bach. The name I was familiar with, but couldn’t quite place it… I ended up buying Parentonomics, and then we go for coffee.

So, over coffee, he asks me about what my topic is, and I gave him the brief summary of my two CI related workshops. Wow.. Little did I realize that I was sitting with the guy who rails against the “fetish” that Agile folks have for automated testing! That his entire approach to “testing” is to use skilled, motivated folks who do “sapient testing”. And I’m the guy who’s selling an approach that REQUIRES automated tests! That encourages expanding the use of automated testing!

He actually walked me through a process of talking about how to “think like a tester”, and it was really great mini-workshop.. He definitely subscribes to the socratic approach, and believes in his message, I was sweating at the end of it! That chat probably sparked more ideas in less time over that coffee then anything else this week. I also heard a lot of ideas and phrases that were echoed in Michael Bolton’s keynote later on in the week. Clearly a lot of collaboration between the two!

Probably the biggest idea that James and chatted about was the idea that automated tests really aren’t automated tests, they are automated checks. They verify that the expected behavior of the code was met. His argument that if you want to do testing, real testing, then computers, automated processes can’t meet that need, only people can.

Now, I don’t know if I believe that is completely true, but I am very aware that the “manual testing” where long test scripts written as Word documents are executed by human beings by hand are really a waste of human potential. And that those test scripts are really, to use James terms, “check scripts” because the people are not using any creativity! In fact, a lot of my interest in CI comes from the idea that people should not do monkey testing, that machines can do it much better, and my frustation with the perception that testing is a low value activity and can be easily shipped off to low skilled folks.

I think that this shift away from the term “test” for automated tests is actually happening in many places. In the Ruby world, we have libraries like Shoulda that are moving from using words like assert to other words like should. A Cucumber test really shows how controlled the space that a test needs to be to work well in an automated fashion:

Scenario: See all vendors
Given I am logged in as a user in the administrator role
And There are 3 vendors
When I go to the manage vendors page
Then I should see the first 3 vendor names

So while I don’t know if I am bought in on the idea that only people can do “testing”, and machines can only do “checking”. Tools like Heckle try to simulate aspects of what a human can do. While not suggesting that we can automate the “does my website look okay after someone changed the CSS” type of work today, in the future our automated testing will be more capable then just “checking” because we will move beyond the very constrained tests we have today to ones that mimicing the richness of the simulators that Airline Pilots use. Instead of testing the training given to pilots, we’ll be testing the robustness of software via simulations!

At any rate, James Bach, while taking a rather provocative approach to sharing his ideas, does subscribe to my favorite bullet in the Agile Manifesto: Individuals and interactions over processes and tools.

Here is him giving a great presentation with the subversive title of “How to Fake a Project” that was incredible entertaining, and also quite thought provoking:

James Bach talking about "Faking a Test Plan"



What do you think? Automated testing is a fetish of the Agile community?

Eric Pugh to speak on Solr at Shenandoah Ruby Users Group October 27th

Posted Tuesday, October 20th, 2009 by Eric Pugh

From the Meetup site:

We’ll look at the thriving Ruby ecosystem that has grown up around integrating with Solr. From Ruby gems that integrate with Solr like solrb and rsolr, to general search solutions like acts_as_solr and sunspot. We’ll also look at a complete “shrink wrapped” catalog solution for Solr using BlacklightOPAC.

You’ll lean the basics of getting started with Solr, and an understanding of what Ruby solutions are available to simplifying adding great search to your site!

As usual, food and beverages will be provided.

OSC will attend and sponsor EdUI Conference 2009 in the University of Vir

Posted Sunday, August 23rd, 2009 by Eric Pugh

edUI Conf

OSC is proud to announce that we will attend and sponsor this year’s EdUI Conference 2009 which is bein held at the University of Virginia on 21st-22nd September 2009. A number of folks from the OSC team will be attending, and stop by our booth in the Vendor Hall on the second day and introduce yourself!

EdUI 2009 boasts a powerhouse lineup of renowned and popular headliner speakers, most often found at the Web industry’s premier events. In addition to these, it features a series of presentations, selected through a proposal process, to allow peers, colleagues, and geek kindreds to enlighten one another with their expertise and ideas. Our very own Arin Sime will be speaking on The Facebook API: Thinking About UI in a Social Way.

Solr 1.4 Enterprise Search Server Book is Released!

Posted Wednesday, August 19th, 2009 by Eric Pugh

Solr 1.4 Enterprise Search Server Book Cover

I am very proud to annouce the first book on Solr has been published by Packt. This has been a labor of love for myself and my co-author David Smiley, and we are excited to see the book now “in the wild!”. Below is a copy of the email sent to the Solr community:

Fellow Solr users,

I’ve finally finished the book “Solr 1.4 Enterprise Search Server” with my co-author Eric. We are proud to present the first book on Solr and hope you find it a valuable resource. You can find full details about the book and purchase it here:
http://www.packtpub.com/solr-1-4-enterprise-search-server/book
It can be pre-ordered at a discount now and should be shipping within a week or two. The book is also available through Amazon. You can feel good about the purchase knowing that 5% of each sale goes to support the Apache Software Foundation. For a free sample, there is a portion of chapter 5 covering faceting available as an article online here:
http://www.packtpub.com/article/faceting-in-solr-1.4-enterprise-search-server

By the way, we realize Solr 1.4 isn’t out [quite] yet. It is feature-frozen however, and there’s little in the forthcoming release that isn’t covered in our book. About the only notable thing that comes to mind is the contrib module on search result clustering. However Eric plans to write a free online article available from Packt Publishing on that very subject.

“Solr 1.4 Enterprise Search Server” In Detail:

If you are a developer building a high-traffic web site, you need to have a terrific search engine. Sites like Netflix.com and Zappos.com employ Solr, an open source enterprise search server, which uses and extends the Lucene search library. This is the first book in the market on Solr and it will show you how to optimize your web site for high volume web traffic with full-text search capabilities along with loads of customization options. So, let your users gain a terrific search experience

This book is a comprehensive reference guide for every feature Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate it with other languages and frameworks

This book first gives you a quick overview of Solr, and then gradually takes you from basic to advanced features that enhance your search. It starts off by discussing Solr and helping you understand how it fits into your architecture—where all databases and document/web crawlers fall short, and Solr shines. The main part of the book is a thorough exploration of nearly every feature that Solr offers. To keep this interesting and realistic, we use a large open source set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project. Using this data as a testing ground for Solr, you will learn how to import this data in various ways from CSV to XML to database access. You will then learn how to search this data in a myriad of ways, including Solr’s rich query syntax, “boosting” match scores based on record data and other means, about searching across multiple fields with different boosts, getting facets on the results, auto-complete user queries, spell-correcting searches, highlighting queried text in search results, and so on.

After this thorough tour, we’ll demonstrate working examples of integrating a variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, XSLT, PHP, and Python.

Finally, we’ll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site

Sincerely,

David Smiley (primary-author)
dsmiley@mitre.org
Eric Pugh (co-author)
epugh@opensourceconnections.com

A huge round of thanks goes to David for bringing me into this project and being such a great partner on it! With 5% of the proceeds going to the Apache Software Foundation, here’s hoping it’s a great success!

MonkeyCI: Super light-weight Continuous Integration for small teams

Posted Wednesday, May 20th, 2009 by Eric Pugh

At OSC, we have a well developed methodology that we apply to our client work, and one of the core tenets is using Continuous Integration to ensure our code behaves the way we intend it to.

However, recently we’ve had two projects were the usual CI solutions such as CruiseControl etc haven’t worked out well, and we had to develop our own internal CI tool that we are ready to publish to the world called MonkeyCI.

On the first project, which was a PHP based application with 5 full time developers, we used CruiseControl with the phpUnderControl addon. However, we were running CruiseControl on what turned out to be an underpowered hosted Windows server, and we kept getting build failure errors related to environmental difficulties. Now, if you’ve seen my talk about CI, you know how big I am on speccing a beefy server for CI, and this experience reinforced that lesson. We decided that migrating the CI environment to a bigger server was something that we felt was in the “nice to have” category, and that it could wait till the next iteration. But we needed something immediate. Enter MonkeyCI.

The heart of CI is all about building the code, running the tests, and publishing the results frequently. Everything else, the reports, the red/green lava lamps, the pretty JavaDocs etc are all gravy. To meet the needs of your developers, you need to know if the “bar is green”. So MonkeyCI does that in a decidedly low tech way:
MonkeyCI

Everytime someone runs the full suite of unit tests they record the day and time, and put their initials. If the build is failing then they immediately fixed it. We’ve played with writing the results in red for failing builds and green for successful builds as well. Then, each day at the standup someone highlights how the CI is doing, and verifies that multiple folks are initialing, which means that the tests are running on multiple systems successfully.

While this does mean you have an additional manual process, it’s also really easy to do, requiring just a whiteboard! And for small project teams, the overhead of maintaining a reliable CI system is too much.

We’re doing another two developer project right now, and at least so far MonkeyCI has been great. We haven’t seen integration issues yet such as database scripts that don’t run, or busted code being checked in. I’ll post a picture of our whiteboard once we have a bunch of checkoffs recorded!

We call this simple low tech process MonkeyCI because typically we refer to anything manual, such as testing by pounding keyboards as Monkey testing. Also, somewhat of a reference to the great developers at the Primate Programming Institute who I am sure would use this approach to CI!:

Primate Programming Inc: The Evolution of Java and .NET Training


Programatically Capturing Web pages as Images

Posted Tuesday, May 12th, 2009 by Eric Pugh

I don’t normally post blog articles that are reposts of other content, but this email thread answered a question that I’ve struggled with, which is how do you render a web page and save it as an image. I do this on HighTechCville, and our Fish4Brains RailsRumble entry a couple of years ago via thumbshots.org, but I’ve never been happy with that service:

At Sun, 3 May 2009 11:19:17 -0400,
Eric Pugh wrote:
Cool!

It’s one of those things that seems like everybody wants it, but no
one has quite figured out. And the various “services” like
thumbshots all feel kinda “seedy”, I am always expecting to see
advertisements for viagra stamped on top of the screenshots and
other questionable business practices.

It seems like you should be able to have the pages be render inside
of a library such as WebKit, but I guess rendering is very
intertwined with monitor displays and resolutions etc.

I have a research projects that aggregates info about people,
events, and organizations and I’d love a better solution for linking
in screenshots of the organizations and individuals site. Here is an
example using the thumbshot service for now..:
http://www.hightechcville.com/organizations/318-worrell-water-technologies

Here is the text (thanks to Mark Phillips for this):

Khtml2png – http://khtml2png.sourceforge.net/ “Khtml2png is a
command line program to create screenshots of webpages. It uses
libkhtml (the library that is used in the KDE web browser Konqueror).
In khtml2png 2.0.5 to 2.5.0, “convert” from the ImageMagick graphic
conversion toolkit is used to create the output files in various
image file formats. 2.6.0 and future development will use the built-in
conversion of the Qt library.” – from the Khtml2png website

Pearl Crescent Page Saver -
http://pearlcrescent.com/products/pagesaver/ “Pearl Crescent Page
Saver” is an extension for Mozilla Firefox that lets you capture
images of web pages. These images can be saved in PNG format or (with
Firefox 2) in JPEG format. The entire page or just the visible portion
may be captured. Options let you control whether images are captured
at full size (which is the default) or scaled down to a smaller size.
Page Saver uses the canvas feature that was introduced in Firefox 1.5.”
– from the Pearl Crescent Page Saver website

Webkit2png – http://www.paulhammond.org/webkit2png/ “Webkit2png is a
command line tool that creates PNG screenshots of webpages. …
webkit2png makes use of webkit, the rendering engine used in Safari.”
– from the Webkit2png website This utility is only available for Mac
OSX because of the dependence on Safari.

Webshot – http://www.websitescreenshots.com/ “WebShot is a program
that allows you to take screenshots and thumbnails of web pages or
whole websites. It comes with a command line interface for advanced
users. The following image formats are supported: JPG, GIF, PNG, BMP.”
– from the WebShot website WebShot uses Internet Explorer as the
engine for creating thumbnails of HTML files.

best,
Erik Hetzner

Richmond SPIN on Continuous Integration

Posted Wednesday, May 6th, 2009 by Eric Pugh

I have the honor of speaking to the Richmond SPIN group on Continous Integration next week Wednesday, May 13. SPIN is the Software and Systems Process Improvement Network, and are the local groups sponsored by Carnegie Mellon University’s Software Engineering Institute.

My presentation is going to be a bit different from some of the previous topics that have talked about process improvement, whereas I am talking about a specific improvement that enhances your process. A CI system can provide the base framework for layering on much more then just the basic automatic code/compile/test cycle, and we’ll talk about what else it can be.

More information and registration is available at http://www.richmondspin.org/home22332.

I’m looking forward to a good crowd, lots of questions, and drinks afterwords!

A Scrum take on “Metrics and Analysis in Companies at Different Maturity Levels of the CMMI”

Posted Thursday, April 23rd, 2009 by Eric Pugh

Last month Scott, Arin, and myself road tripped to attend the Richmond SPIN meeting, where Kris Puthucode of the Software Quality Center gave a talk on “Metrics and Analysis in Companies at Different Maturity Levels of the CMMI
Model”
. The focus on the talk was what results you can expect out of metrics if you do the work of performing the analysis required.

I was attending the presentation with a certain sense of trepidation… I consider myself a hard core developer( despite my “Test Obsessed” armband) who doesn’t have time or patience for pointy headed manager paperwork. But I am also someone who focuses on process improvement and honing my craft of software development, so where can metrics inspired by CMMI be useful to a fast moving Agile team cranking out functioning software every three weeks? So I listened to Kris and tried to think about what he said in the context of Scrum.

The first slide pointed out that there are three kinds of lies: “lies, damn lies, and statistics”. You need to be careful about what your data says. When you are measuring velocity in Scrum, you need to have a couple of sprints, at least 3 to have any sense of your progress. If you say “we get X done” based on the first sprint, well often the first one is very conservative sprint. And, as you look at your average burndown, you need to have a couple of days of information before you can get a sense of average burndown as the first days often you find as many tasks as you accomplish. And, over the 15 days of your iteration, progress can be pretty spiky… Many teams have pretty flat burn down at the beginning, and then some steep drops… Ideally you are looking to see progress per day become flatter, less spiky, which would indicate that your estimating is improving. Or your team is being less affected by external factors that might hamper their productivity.

He talked about whether to use average or median numbers in looking at series of numbers, such as looking at your burndown.. If you have a lot of outliers, then use median to get a better view, otherwise use average. So if you burndown 20, 25, 22, 40, 25 then using the average is good. But if you have 8, 20, 25, 22, 40, 12 then maybe median would be better.

Kris talked about the cultural challenge of convincing people to provide the data required to build metrics. People need to know why the numbers matter, and even better why it will help then. It’s why developer’s hate filling out TPS reports! And why I like Scrum and it’s low overhead, as well as more passive measurement tools like HackyStat and 6thSense(now part of RallyDev). I feel like mandating one metric, say your basic time tracking is viable, but if you add more on you start getting more push back or gaming of the metrics.

He talked about getting metrics like Project Start Date and Project End Date. A local company splits up it’s year as 3 week iterations, and then apportions iterations across competing projects. These iterations then feed the project start and end dates.

He stressed that you need to have a shared vision on metrics, and a shared vision of what “on time, in budget, with quality” really means. I know we the other day had a scrum team debate how to track found tasks. And this was a set of people that had worked together previously on different project having different visions of tracking found tasks and how they should affect the “ideal burndown” line! Covering periodically what that shared vision, and ensuring your team and stakeholders are all on the same page is very valuable.

He stressed that your metrics need to be actual “measurable” things. You need to be able to quantify the metrics that you are using using a shared basis so that when you compare two things using the same metric that you are doing an apples to apples comparison, not an apples to kiwi comparison! For Scrum, it means you need to use the same time frame for iterations, and you can compare one sprint to another for a specific team on a specific project, but not across teams or projects.

Scrum for us provides a very standarized metrics across the OSC organization, regardless of client or specific technology. As long as we are sharing the same vision for our metrics!

When we do a retrospective, and look back at our burndowns, we are doing “Progress Indicators” that are lagging indicators. One of the things that the speaker was advocating was to look at forward looking indicators that predict into the future where we will be. But of course, identifying and seeing a leading indicator is difficult, and takes a lot more analysis. In scrum we would have to tie our tasks to various sprint goals which are appropriately estimated against to provide our velocity.

Did highlight that you need multiple projects happening to be able to gather the variety of data points to be able to compare data points. Compare two scrum teams together and it’s tough to compare them because you can’t see the outliers. But, if you have 10 scrum teams, who have a shared vision of the metrics, then you can start comparing them together. You may have to normalize across the teams, but with enough iterations you can compare and predict.

So some things to show would be a histgram of how much the team burns down a day. Highlight what kind of deviation we have in our progress per day. Over multiple sprints you can maybe see what the first third, middle third, and final third look like. Can we characterize “At OSC, we typically see this kinda of result in the thirds of the project?” Hey, does this feed into our Waterfall projects in 3 weeks?

First week is requirements solidification. Second week is development. Third week is testing and polish.

Can we figure out how to predict the results for sprint 3? We could do this for sprint 1 and 2 and predict sprint 3.

We measure progress per day as a ratio, and then sum it over a week. With that progress per week, then we can see what a sprint 3 would do.