Posted on:: October 17th, 2004
I’ve published a new issue of my newsletter. (You can subscribe for free, you know.)
The future of facts (and the rise of fact servers): Are facts going to become as cheap and uninteresting as styrofoam peanuts?
The Wikipedia had to freeze the George W. Bush entry a few weeks ago because people were altering it to suit their political viewpoints at an alarming rate. So, the editors pared the page down to the non-controversial "core" of facts. There was still a lot of information there — much more than merely "He was born, he drank, he became president" — and occasional acknowledgements of controversies, such as whether Bush satisfactorily completed his National Guard service. But, most interesting to me, towards the top, on the right, the Wikipedia ran one of the staples of its biographical entries: A fact box.
I find this two-tiered view of facts, quite common in reference works, fascinating. And in the context of a bottom-up work such as the Wikipedia, in the midst of a dust-up over what constitutes a factual account of the life of W, you have to ask: What’s happening to facts?…
The end of data: In the new world of classification and categorization, data and metadata are indistinguishable.
There used to be a real difference between data and metadata. Data was the suitcase and metadata was the name tag on it. Data was the folder and metadata was its label. Data was the contents of the book and metadata was the Dewey Decimal number on its spine. But, in the Third Age of Order everything is becoming metadata…
Walking the walk: O’Reilly’s foo camp is brilliant marketing in which the product is never mentioned Cool tool: Open source Audacity sounds good What I’m playing: Far Cry Email: How much of an anti-Semitic misogynist was Melvil Dewey? Bogus contest: Name the metadata bundles discussed in "The end of data" article |
Categories: misc dw
Thanks for the new Joho!
Re your piece on “The end of data”, it makes more sense to me to call what’s happening “the end of *metadata*”.
Here is my logic:
Metadata was the stuff “outside the picture” that you didn’t process, because its purpose was only to record constraints on the data you did process.
More and more, those metadata constraints have become interesting in themselves, and are being treated as data that can be processed. The “meta” quality is what is coming to an end.
I feel that this better explains what I think is one of your points: the metadata used to be out of the picture because the constraints were assumed to be absolute. It’s become more and more necessary to recognize that these constraints are relative (e.g., to a culture, to a company, to a system, etc.) and therefore must be processed as data along with the data originally in the picture.
I personally think the use of the term “metadata” is often complicating, i.e., many ideas become simpler when “metadata” is replaced with the term “data”. For example (from your article):
“Now take a closer look at these information objects. They look like contents tagged with lots of metadata, but in fact they’re all metadata.”
might become:
Now take a look at these information objects–they’re all data.
Regardless of my personal take on it, I think your article is correct in asserting that the distinction between data and metadata is less and less useful.
I wanted to add:
I agree with what you’re saying about data being used to categorize other data: in the sense that this categorizing-data is called “metadata”, I think it’s correct to suggest that more and more data is becoming “metadata”.
In general, more and more data is being used for categorizing other data.
But, categorizing-data is not always called metadata. It’s one particular model / tradition (albeit, a very popular one) that defines information in terms of objects, and the stuff in “tags” on those objects as metadata.
This “object” view is subset of the total picture of how data is stored and processed. And, though the object view is often a convenient way to picture things (specifically because it can be used to move things out of the picture), it’s not always the full picture in terms of how data is stored and processed.
How about “metacontent”?
‘ethnoclassification’ reminds me of my ’emergent markup language’ applied at much coarser level and geared toward coarse-grained metadata for search use cases.
I think mixing auto-completion with ethnoclassification will remove some of the problems.