Data in its untamed abundance gives rise to meaning
Seb Schmoller points to a terrific article by Google’s Alon Halevy, Peter Norvig, and Fernando Pereira about two ways to get meaning out of information. Their example is machine translation of natural language where there is so much translated material available for computers to learn from, which (they argue) works better than trying to learn from attempts that go up a level of abstraction and try to categorize and conceptualize the language. Scale wins. Or, as the article says, “But invariably, simple models and a lot of data trump more elaborate models based on less data.”
They then use this to distinguish the Semantic Web from “Semantic Interpretation.” The latter “deals with imprecise, ambiguous natural languages,” as opposed to aiming at data and application interoperability. “The problem of semantic interpretation remains: using a Semantic Web formalism just means that semantic interpretation must be done on shorter strings that fall between angle brackets.” Oh snap! “What we need are methods to infer relationships between column headers or mentions of entities in the world.” “Web-scale data” to the rescue! This is basic the same problem as translating from one language to another, given a large enough corpus of translations: We have a Web-scale collection of tables with column headers and content, so we should be able to algorithmically recognize clustering concordances of meaning.
I’m not doing the paper justice because I can’t, although it’s written quite clearly. But I find it fascinating.
Categories: Uncategorized dw