January 15, 2012
So you think you can scrape?
If you’re thinking about scraping a web page to extract the delicious data bits from it, ScraperWiki looks like a great place to start. It’s got tools, examples, and a community. Right now the tools are in Ruby, Python and PHP, but they’re thinking about adding Javascript.
If I have time this weekend, I’m going to give it a try scraping the weekly Berkman Buzz post. Until a couple of weeks ago, I was fairly routinely posting the Buzz on this blog, because I had written a little scraper and formatter that let me go from the email version to the blog markup I prefer. But then those bahstahds at Berkman went all HTML on the weekly email, which completely broke my scraper. But the Berkman page that lists the Buzz looks like it’s ripe for trying out the ScraperWiki tools. Looking forward to it…