logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

January 15, 2012

So you think you can scrape?

If you’re thinking about scraping a web page to extract the delicious data bits from it, ScraperWiki looks like a great place to start. It’s got tools, examples, and a community. Right now the tools are in Ruby, Python and PHP, but they’re thinking about adding Javascript.

If I have time this weekend, I’m going to give it a try scraping the weekly Berkman Buzz post. Until a couple of weeks ago, I was fairly routinely posting the Buzz on this blog, because I had written a little scraper and formatter that let me go from the email version to the blog markup I prefer. But then those bahstahds at Berkman went all HTML on the weekly email, which completely broke my scraper. But the Berkman page that lists the Buzz looks like it’s ripe for trying out the ScraperWiki tools. Looking forward to it…

Tweet
Follow me

Categories: tech Tagged with: scraping • tech Date: January 15th, 2012 dw

3 Comments »


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!