Posts Tagged ‘HighTechCville’

Programatically Capturing Web pages as Images

Posted Tuesday, May 12th, 2009 by Eric Pugh

I don’t normally post blog articles that are reposts of other content, but this email thread answered a question that I’ve struggled with, which is how do you render a web page and save it as an image. I do this on HighTechCville, and our Fish4Brains RailsRumble entry a couple of years ago via thumbshots.org, but I’ve never been happy with that service:

At Sun, 3 May 2009 11:19:17 -0400,
Eric Pugh wrote:
Cool!

It’s one of those things that seems like everybody wants it, but no
one has quite figured out. And the various “services” like
thumbshots all feel kinda “seedy”, I am always expecting to see
advertisements for viagra stamped on top of the screenshots and
other questionable business practices.

It seems like you should be able to have the pages be render inside
of a library such as WebKit, but I guess rendering is very
intertwined with monitor displays and resolutions etc.

I have a research projects that aggregates info about people,
events, and organizations and I’d love a better solution for linking
in screenshots of the organizations and individuals site. Here is an
example using the thumbshot service for now..:
http://www.hightechcville.com/organizations/318-worrell-water-technologies

Here is the text (thanks to Mark Phillips for this):

Khtml2png – http://khtml2png.sourceforge.net/ “Khtml2png is a
command line program to create screenshots of webpages. It uses
libkhtml (the library that is used in the KDE web browser Konqueror).
In khtml2png 2.0.5 to 2.5.0, “convert” from the ImageMagick graphic
conversion toolkit is used to create the output files in various
image file formats. 2.6.0 and future development will use the built-in
conversion of the Qt library.” – from the Khtml2png website

Pearl Crescent Page Saver -
http://pearlcrescent.com/products/pagesaver/ “Pearl Crescent Page
Saver” is an extension for Mozilla Firefox that lets you capture
images of web pages. These images can be saved in PNG format or (with
Firefox 2) in JPEG format. The entire page or just the visible portion
may be captured. Options let you control whether images are captured
at full size (which is the default) or scaled down to a smaller size.
Page Saver uses the canvas feature that was introduced in Firefox 1.5.”
– from the Pearl Crescent Page Saver website

Webkit2png – http://www.paulhammond.org/webkit2png/ “Webkit2png is a
command line tool that creates PNG screenshots of webpages. …
webkit2png makes use of webkit, the rendering engine used in Safari.”
– from the Webkit2png website This utility is only available for Mac
OSX because of the dependence on Safari.

Webshot – http://www.websitescreenshots.com/ “WebShot is a program
that allows you to take screenshots and thumbnails of web pages or
whole websites. It comes with a command line interface for advanced
users. The following image formats are supported: JPG, GIF, PNG, BMP.”
– from the WebShot website WebShot uses Internet Explorer as the
engine for creating thumbnails of HTML files.

best,
Erik Hetzner

Indexing Information about people needs a “time axis”

Posted Tuesday, December 9th, 2008 by Eric Pugh

I bet most people have done the vanity search on Google for themselves, I know I have. The problem with most indexing systems is that they go out and collect lots of information, but most of that information doesn’t have have any sense of time. They are just random points of data. But, as the years pass, and we put more of ourselves on the Internet, we mostly want to build up a picture not of ALL the data about a person, but a picture of person based on data that is applicable RIGHT NOW. For example, in my vanity search, today my blog on JRoller comes up first. But I haven’t blogged there since March 2007, and the things I am interested in right now are better exhibited by links 2, 8, and 9. Being, respectively, my company OpenSource Connections, Open Source in the Federal Government, and Ruby on Rails. We need to be able to cluster and show data about people, but also be able to plot it on a timeline. So that older data doesn’t overwhelm the newer information. On Amazon, I am still getting recommendations for Java books, even though I am a Rubyist now!

Of course, adding “time” to data is hard. For some data you could base the time of a piece of data about something based on the context it is in: “I graduated in 1994″. Alternatively you could try and infer date based on when content was created, like in an RSS article. Or, hopefully have some sort of meta tag specifing when stuff was created.

For HighTechCville, my research project, I am struggling with the fact that a lot of companies’ address data in HTC is out of date because the data source, a survey taken by CBIC, is a couple years out of date, and based on older tax records. While I am preserving metadata about when information is added and changed, that doesn’t really give me “true” sense of what “date” goes with each data source.

A recent article on O’Reilly by Nick Bilton talks about the value of Twitter having a constant stream of information that CAN be dated, because it’s all real time “what am I interested in Right Now” and can provide that timeline of changing user data. But for a project like HTC that is trying to backwards infer that information, it’s a lot harder, and a lot fuzzier!

Any great suggestions, please leave them in the comments!

HighTechCville@Neon Guild 9/15/08

Posted Wednesday, September 10th, 2008 by Eric Pugh

I’ll be presenting HighTechCville to the Neon Guild next Monday, September 15.

I’ve been looking forward to this for months because most of the people information in HighTechCville comes from the Neon Guild public membership database. My initial success in finding Communities of Interest came about by looking at the over 200 people in the Neon Guild and finding 8 folks who were all technical writer folks! I would never have guess that there are enough people in the Neon Guild who do technical writing to do a group dinner together!

See you’all there!

Here are directions from Debra Weiss:

Location:
Inova Solutions
110 Avon Street
Charlottesville, VA 22902

Directions from downtown Cville:
Take Market Street E to Ninth/Avon St, turn right.
Go over the bridge, get in the left lane.
Look for Spudnuts on the left.
Turn Left at Spudnuts, and then another immediate left.
Follow around, you’ll see a large brick building. That’s Inova.
Go around to the front of the building and park.
Take the elevator to the second floor. We’re in the café.