Mining Wikipedia

Posted on:: August 6th, 2006

Evgeniy Gabrilovich and Shaul Markovitch at Technion (Israel Institute of Technology) have written a paper about applying machine learning techniques to Wikipedia to improve automatic categorization of text. For example, by analyzing Wikipedia, their algorithms can figure out from the sentence “Wal-Mart supply chain goes real time” that Wal-Mart uses RFIDs to manage their inventory. More examples:

Given a very brief news title “Bernanke takes charge”, a casual observer can infer little information from it. However, using the algorithm we developed for consulting Wikipedia, we find out the following relevant concepts: BEN BERNANKE, FEDERAL RESERVE, CHAIRMAN OF THE FEDERAL RESERVE, ALAN GREENSPAN (Bernanke’s predecessor), MONETARISM (an economic theory of money supply and central banking), INFLATION and DEFLATION. As another example, consider the title “Apple patents a Tablet Mac”. Unless the reader is wellversed in the hi-tech industry and gadgets, she will likely find it hard to predict the contents of the news item. Using Wikipedia, we identify the following related concepts: MAC OS (the Macintosh operating system) LAPTOP (the general name for portable computers, of which Tablet Mac is a specific example), AQUA (the GUI of MAC OS X), IPOD (another prominent product by Apple), and APPLE NEWTON (the name of Apple’s early personal digital assistant).

Then the articles goes into the technical aspects of doing this. I’m in no position to evaluate them. But using Wikipedia as a knoweldge mine is a cool idea. (Thanks to Hanan Cohen for the link.) [Tags: wikipedia everything_is_miscellaneous taxonomy ai knowledge_representation]

Follow me

Categories: Uncategorized dw

Mining Wikipedia

Share this:

Leave a Reply