Joho the Blog » Mining Wikipedia
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

Mining Wikipedia

Evgeniy Gabrilovich and Shaul Markovitch at Technion (Israel Institute of Technology) have written a paper about applying machine learning techniques to Wikipedia to improve automatic categorization of text. For example, by analyzing Wikipedia, their algorithms can figure out from the sentence “Wal-Mart supply chain goes real time” that Wal-Mart uses RFIDs to manage their inventory. More examples:

Given a very brief news title “Bernanke takes charge”, a casual observer can infer little information from it. However, using the algorithm we developed for consulting Wikipedia, we find out the following relevant concepts: BEN BERNANKE, FEDERAL RESERVE, CHAIRMAN OF THE FEDERAL RESERVE, ALAN GREENSPAN (Bernanke’s predecessor), MONETARISM (an economic theory of money supply and central banking), INFLATION and DEFLATION. As another example, consider the title “Apple patents a Tablet Mac”. Unless the reader is wellversed in the hi-tech industry and gadgets, she will likely find it hard to predict the contents of the news item. Using Wikipedia, we identify the following related concepts: MAC OS (the Macintosh operating system) LAPTOP (the general name for portable computers, of which Tablet Mac is a specific example), AQUA (the GUI of MAC OS X), IPOD (another prominent product by Apple), and APPLE NEWTON (the name of Apple’s early personal digital assistant).

Then the articles goes into the technical aspects of doing this. I’m in no position to evaluate them. But using Wikipedia as a knoweldge mine is a cool idea. (Thanks to Hanan Cohen for the link.) [Tags: ]

Previous: « || Next: »

Leave a Reply

Comments (RSS).  RSS icon