Joho the Blog » metadata

November 30, 2008

Philosophical problems with folksonomies

[Note from the next day: This is a little embarrassing. I just noticed that this was first published in 2006. It came through my inbox on Saturday, and I carelessly thought it had just come out.]

Elaine Peterson, associate professor at Montana State University, has an article in D-Lib Magazine called “Beneath the Metadata: Some Philosophical Problems with Folksonomy.” It’s good to see the issues taken seriously, and many of her premises strike me as true. But, I disagree with her pragmatic conclusion that “A traditional classification scheme will consistently provide better results to information seekers.” And I think I disagree with her philosophical critique, although I am not confident that I’m understanding it as she intends.

I read the article two different ways. At first I thought it was a critique of folksonomies on the grounds that they contradict traditional philosophical premises. The next time I read it, I thought it was simply pointing out the differences. Now I’m tending toward my first reading, in part because her section on the traditional defends it against some objections while about half of the section on folksonomies is critical of them.

Her philosophical criticism seems to be rooted in what she presents as the Aristotelian approach to classification: Things are lumped with other things like them, and simultaneously distinguished from them. Most important, she says, is the idea that “A is not B,” which means that A cannot be truthfully classified also as a B. But what about digital items that “can reside in more than one place”? That is “irrelevant,” she says, “since one is talking about a classification scheme, not about the items themselves.” I have to admit I don’t understand this. What is the philosophical basis for restricting things to one category if not that that restriction reflects the metaphysical truth that A cannot also be B? So, I think she’s saying we are to reject multiple classifications because such classifications are untrue metaphysically.

This reading is supported by the section on folksonomy, where she identifies philosophical relativism as “the underlying philosophy behind folksonomies,” and pretty clearly intends this as a criticism. (I personally am no fan of philosophical relativism, although there’s a longer story there.) The problem with relativism, she writes, is that it means classification escapes from the demand that A be A and not be B. I take this as indicating that, in her section on traditional classification, she is agreeing with the 1930 textbook she cites that recommends that classifiers give “emphasis to what the author intended to describe.” If you’re arguing that, on metaphysical grounds, things should only be classified in a single category, I guess looking for the author’s intention gives you a way forward…even though categorizing only by the author’s intent is to me like insisting that readers only underline passages that the author considers significant.

And this highlights what I think is my root disagreement with Elaine’s piece (if I’m understanding it correctly). It’s fine to raise pragmatic problems with folksonomies, as she does. But Elaine is pointing at philosophical problems. And those problems require assuming that folksonomists are trying to do what Aristotelian categorizers are trying to do. But they’re not. Aristotelians (I’m using this sloppily as shorthand, so pardon my “tagging”) are trying to find the one true and right category for each thing, creating a well-ordered system free of contradictions. Folksonomies are trying to help us find stuff.

Inconsistencies in tags actually make a folksonomy useful; a folksonomy that consists of 1,000 instances of a single tag isn’t worth the folksonomizing. But these inconsistencies are a problem for Elaine because she is thinking of a folksonomic classification as a philosophical statement rather than as a mere tool. She says that “perhaps … the strongest criticism one could make of folksonomies” is that because tags can be true for one group and false for another,

a folksonomy universe allows both true and false statements to coexist. Because tags are relativized, personal, idiosyncratic views can coexist and thrive in the form of tags, in spite of their inconsistencies. Readers of texts on the Internet become individual interpreters, despite the document author’s intent.

To this many of us will say “Hallelujah!” because we disagree with Elaine’s opening claim that all classification is about answering the philosophical question, “What is it?” Indeed, she’s a hard-liner: An inconsistency to Elaine is any multiple classification, not simply one that contradicts others. Classifying a dissertation about “Moby-Dick” under “ecology” as well as under “novels: 19th Century” would introduce an insupportable inconsistency (in Elaine’s terms). She seems to assume that tags are Aristotelian judgments in which we say that A is a B. But, when I tag a photo of my wife as “ann,” “birthday,” “2008,” and “family events,” I am not saying the essence of Ann (or her photo) is any of those things. Even if I believed in essentialism (I pretty much don’t), we could make use of Aristotle’s idea of “accidental properties” (non-essential but true) to explain what I’m doing. And if I tag Oliver Stone’s “Alexander” as “Angelina Jolie” or “tripe” knowing full well that I am not staying true to the author’s intent, well, tough on Oliver. Tags are not always truth claims, and a folksonomy is not intended to mirror nature. Indeed, a folksonomy can reveal the most appalling areas of ignorance and prejudice in a populace — and, pragmatically, we may well want to address those popular errors, especially since a folksonomy can indeed reinforce them

But, Elaine is right to point to the philosophical implications of folksonomies. An individual folksonomy may make no claim to providing the real truth about how the world is ordered, but the use of folksonomies generally carries some philosophical implications. Elaine sees relativism underneath them while I see a form of pragmatism. But folksonomies didn’t arise out of philosophy. They are a “found” ordering: Hey, we have all these tags, so why don’t we make use of them in a more systematic way? So, I think Elaine is mislocating the philosophical moment in folksonomies. Philosophy isn’t underneath them or behind them. It’s after them, in their effect. Folksonomies reinforce our move away from the essentialist view that every thing has a single category that reflects its single and real essence. We’ve been moving away from that view for a long time as a culture. The success of folksonomies as a tool reveals that we accepted the traditional Aristotelian scheme in part because it was useful. If its utility has been undercut, then we have to ask for the other reasons we should believe in an Aristotelian metaphysics.

The ball is in Aristotle’s court.

* * *

Most of Elaine’s outright criticisms of folksonomies are actually practical, not philosophic. She makes them without empirical evidence. She has not convinced me that she’s right. For example, her final paragraph says:

A traditional classification scheme based on Aristotelian categories yields search results that are more exact. Traditional cataloging can be more time consuming, and is by definition more limiting, but it does result in consistency within its scheme. Folksonomy allows for disparate opinions and the display of multicultural views; however, in the networked world of information retrieval, a display of all views can also lead to a breakdown of the system… Most information seekers want the most relevant hits when keying in a search query.

By “exact” she apparently means the results include fewer false results (where a result is false if the search term doesn’t really apply to the result, as when you search for “fish” and get back posts about dolphins). And that seems correct: A professionally constructed index should have fewer of those sorts of mistakes. But the second criterion in her concluding paragraph is relevancy, and there folksonomies well may beat a professionally constructed index. Not only might a folksonomy retrieve results more relevant to me personally or to my cultural sub-group, but it constructs a semantic system that can retrieve results the narrow and carefully categorizing by experts might miss. So, I disagree with her last sentence: “A traditional classification scheme will consistently provide better results to information seekers.” Traditional classification is best for certain types of searches — ones where you want precision over recall and relevancy, and especially where there is a confined domain of contents that you have to be sure you’ve searched thoroughly — but is not as good as a folksonomy for other types of searches.

In short, neither traditional nor folksonomic classifications are best. Each is best for something.

[Tags: folksonomy taxonomy philosophy elaine_peterson ]

Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • folksonomy • libraries • metadata • philosophy • tagging • taxonomy Date: November 30th, 2008 dw

10 Comments »

November 27, 2008

LibraryThing vs. Library of Congress

Vincent Sterken has posted his master’s thesis, which examines LibraryThing.com to understand the dynamics and utility of social tagging. It begins with an exceptionally clear backgrounder on tagging and taxonomies, and then moves to a fascinating exploration of LibraryThing’s folksonomy, including a comparison of how LibraryThing’s community and the Library of Congress classify books.

[Tags: tagging taxonomy folksonomy vincent_sterken librarything library_of_congress everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • folksonomy • libraries • librarything • metadata • tagging • taxonomy Date: November 27th, 2008 dw

1 Comment »

November 12, 2008

Google flu interview – Request for Help

I’m going to be on the radio news show Here and Now tomorrow to talk about Google.org’s ability to track outbreaks of flu by charting search terms (“flu symptoms”), time, and presumed IP location. I plan on talking about it as an example of the power of having enormous amounts of data, and of putting to use information generated for some other purpose.

Any ideas about how else this sort of technique could be used or is being used? (Amazon’s personalization is a different sort of example.) Any concerns (other than the how-not-to-do-it example from AOL)? [Tags: google flu crowd_sourcing wisdom_of_the_crowd privacy ]

Follow me

Categories: Uncategorized Tagged with: digital culture • flu • folksonomy • google • marketing • metadata • privacy • web 2.0 Date: November 12th, 2008 dw

11 Comments »

September 30, 2008

Universal academic directory

Academia.edu lets you add yourself to its gigantic Tree of University Departments. It’s a slick, slidey, Ajaxy UI, and there seem to be only benefits to adding your name to it, even though it will forever be incomplete.

The question is whether it’s easier and more beneficial to count on participants to centralize their contact info at Academia.edu or to hope that universities somehow might agree on a metadata standard â€” a microformat â€” for how they list faculty members on their own sites. Since the latter isn’t happening, the former becomes appealing. (Thanks to John Palfrey for the link.)

[Tags: academics universities taxonomy folksonomy everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: academics • education • everythingIsMiscellaneous • folksonomy • metadata • taxonomy • universities Date: September 30th, 2008 dw

1 Comment »

September 23, 2008

Mettadatta fer dumbies

From TechPresident:

Barek Maccane for Prezidunt: Here’s a bit of silliness. Like any other perfectly normal person, I happened to be skimming the source code for JohnMcCain.com early this morning. There, I discovered the variants on the candidate’s name that programmers helpfully included in the site’s keyword meta tags in a bid to draw in sloppy spelling searchers: “John McKaine, John MacCane, Jon McCain,” and “John MacCaine.” Team Obama is also prepared for fat-fingered Googlers, with meta tags for “Barack, Barck,” and “Barek.” Thankes. Veree helpfil.

[Tags: metadata everything_is_miscellaneous folksonomies ]

Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • folksonomies • metadata Date: September 23rd, 2008 dw

Be the first to comment »

June 23, 2008

Death by tags

From BoingBoing comes this hilarious set of Amazon reviews of $500 audio cables from Denon. Best of all, BoingBoing points to the tags people have associated with the cables.

Oh, market conversations! What claims and brands won’t you take apart?

[Tags: market_conversations denon everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: cluetrain • denon • everythingIsMiscellaneous • everything_is_miscellaneous • marketing • market_conversations • metadata Date: June 23rd, 2008 dw

1 Comment »

June 20, 2008

Microsoft says ODF has won

From Slashdot:

“At a Red Hat retrospective panel on the ODF vs. OOXML struggle panel, a Microsoft representative, Stuart McKee, admitted that ODF had ‘clearly won.’ The Redmond company is going to add native support of ODF 1.1 with its Office 2007 service pack 2. Its yet unpublished format ISO OOXML will not be supported before the release of the next Office generation. Whether or not OOXML ever gets published is an open question after four national bodies appealed the ISO decision.”

Of course, Open Document Format winning isn’t exactly the same as OOXML — the 6,000 page standard Microsoft pushed through ISO — losing. Slashdot commentators are right to be plenty skeptical. Still, this is a good thing since it opens a practical path to document interoperability in a public, open format. [Tags: odf ooxml microsoft standards ]

Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • metadata • microsoft • odf • ooxml • standards Date: June 20th, 2008 dw

2 Comments »

May 31, 2008

Andrew Hinton on info, meaning, and all the rest

Andrew Hinton has posted the annotated slides of his talk at the Information Architecture Summit. While it ultimately is aimed at IA’s who are struggling with their profession’s identity (i.e., all IA’s), Andrew’s talk is quite broad and fundamental, and also engaging and creative. Well worth the read. [Tags: ia iasummit2008 andrew_hinton information ]

Follow me

Categories: Uncategorized Tagged with: digital culture • for_everythingismisc • ia • iasummit2008 • infohistory • information • metadata Date: May 31st, 2008 dw

1 Comment »

Scan and Release: Digitizing the Boston Public Library

I’ve lived in Boston since 1986, but have never made it into the great Boston Public Library. Until today. My streak was totally broken because the little group digitizing the BPL’s holdings invited me in to see what they’re doing. And, oy, the work they have cut out for them!

But they’re an intrepid band. And they recognize that they’re up to something important. Although some in the BPL may have thought that digitized prints and photos are just lesser-qualities backups, the group knows that they’re not only bringing hidden images into the public sun, they are engaged in a social project that changes how and what we know. (What’s not to love about librarians?)

The Print Stack, where photos, prints and miscellaneous other objects are stored, only seems to be in the basement. The ceiling is low, there are no windows, and the lighting leaches vitamin D out of your body. It’s long and overflowing, reminiscent of the warehouse that ends Citizen Kane, and that is echoed in two Indiana Jones movies.

Boston Public Library storage area
Boston Public Library Print Stack

If you want to find a particular image in the roughly two million prints and images (no one knows for sure), you ask Aaron. Some bits and portions have catalogs of various sorts, but overall, it’s a disarray of metadata. For example, the Herald Traveler collection of photos has about 1.2 million pieces, arranged in 104 cabinets, each with four drawers. The folders and drawers are labeled, which helps a lot, but they’re not indexed, much less cross-indexed.

Herald Traveler collection in file drawer
Herald Traveler collection

At least those photos have captions. Aaron shows me some beautiful 19th century photographs of Indian architecture. Many years ago, the BPL went to enormous trouble to paste the photos into multiple volumes — turning the photos into a book, as Aaron points out — but didn’t bother to record the notes on the back of the photos. Aaron is now going to have to dissolve the pages to expose the notes.

Eroded negative
Aaron holds up a degraded negative.
A dirigible is barely visible on it.
Tough reclamation project.

The archive doesn’t just have pictures and prints. It’s got, well, everything, including a couple of old typewriters and a collection of matchbook covers from Boston restaurants.

matchbook covers
Boston matchbook cover collection

Of this abundance, the digital group has so far scanned about 24,000 objects. When I point out to Maura Marx, the group’s head, that, given the library’s estimate that it has maybe 23 million objects, she’s looking at a 2,000 year project, she tells me that they’re just getting started. They’re going to bulk up, maybe do some offsite digitizing, and begin to make some serious progress. When I ask Thomas Blake, who does the actual digitizing, how he decides which stuff to do, he laughs a little and says, “What I think is cool.” And, since the public has an appetite for “choochoo trains, maps and postcards,” he’s done a bunch of them. The BPL is, after all, a public institution that both serves the public and relies upon the public’s support.

stacked volumes

The Library has been posting digitized works at Flickr. Take a look at the 19th century photos of Egypt, or, yes, the postcards And the book fetishists among you should definitely check out the “Art of the Book” collection. Predictably and hearteningly, the public — you and me, sister — have been commenting and adding to what’s known. Maura hopes to get permission to put the images into the Commons. Digitizing and posting — “scan and release,” in the group’s memorable way of putting its mission — turns patrons into historians.

The scanning is slow because it’s one guy who’s doing a careful job. The camera has a 22 megapixel chip, but they’ve been known to digitize at 88mps, creating files that are half a gig in size. Tom likes saving the RAW files to avoid unnecessary data loss. You never know what’s going to be useful. For example, he had been scanning postcards at 300 dpi, but a curator pointed out that then you couldn’t see the dotscreen pattern, which might be of interest to someone. So now Tom scans them at 600dpi. Overall, they have about 1.5 terabytes of stored images.

The metadata is a whole ‘nother issue. Chrissy Watkins, who has been there for four days — she had been at the JFK Presidential Library — is working on it. For now, Tom gives every item an arbitrary and unique ID number, the key piece of any metadata scheme. But the BPL is facing the inevitable conundrum: Maximize the metadata but slow the process, or gather less metadata but go at a far faster clip. The group seems to be leaning toward the latter, which makes sense to me. They’ve been using what Tom calls the “Curator Core,” a reference to the Dublin Core metadata standard for books. Trying to capture everything that might be useful is a task beyond daunting. For example, Michael Klein points to “fore-edge paintings,” paintings done on the edges of a book that are revealed when you fan the book slightly. Does the BPL have to come up with a standard that includes whether you fan the book to the left or right? There are so many different types of objects that building a standard or an ontology that captures them all would absorb all of the team’s time. (“The special case is not as special as you’d think,” says Michael.) Instead, they need to scan scan scan, and capture some reasonable set of metadata, to which more metadata can accrete.

OCA
One of the ten Open Content Alliance book scanners.

“We’re going from collect and hide to scan and release,” says Tom. And in so doing, the until-now unpublished holdings are going not just from no value to some value. The digital group is in fact radically multiplying the value of the Boston Public Library’s holdings. And as we the recipients of this gift incorporate the images, adding information to them, and contextualizing them, we are further enriching the holdings, far beyond what any small group, no matter how intrepid, could manage.
[Tags: libraries bpl metadata oca archives everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: archives • bpl • culture • digital culture • everythingIsMiscellaneous • folksonomy • libraries • metadata • oca • photos • tagging • taxonomy Date: May 31st, 2008 dw

4 Comments »

May 21, 2008

Health Commons launched

Science Commons, in its relentless drive for product line expansion (I kid because I love), has posted a white paper proposing a Health Commons. In it, the authors, Marty Tenenbaum and John Wilbanks, lay out the problems and suggest a solution.

They write:

We are no longer asking whether a gene or a molecule is critical to a particular biological process; rather, we are discovering whole networks of molecular and cellular interactions that contribute to disease. And soon, we will have such information about individuals, rather than the population as a whole. Biomedical knowledge is exploding, and yet the system to capture that knowledge and translate it into saving human lives still relies on an antiquated and risky strategy of focusing the vast resources of a few pharmaceutical companies on just a handful of disease targets.

After citing more problems with the current system, the authors propose a Health Commons:

Imagine a virtual marketplace or ecosystem where participants share data, knowledge, materials and services to accelerate research. The components might include databases on the results of chemical assays, toxicity screens, and clinical trials; libraries of drugs and chemical compounds; repositories of biological materials (tissue samples, cell lines, molecules), computational models predicting drug efï¬cacies or side effects, and contract services for high- throughput genomics and proteomics, combinatorial drug screening, animal testing, biostatistics, and more. The resources offered through the Commons might not necessarily be free, though many could be. However, all would be available under standard pre-negotiated terms and conditions and with standardized data formats that eliminate the debilitating delays, legal wrangling and technical incompatibilities that frustrate scientiï¬c collaboration today.

The paper emphasizes the need for metadata standards: “Providing such standards, Heath Commons improves and extends the public domain by
integrating hundreds of public databases into a single framework…” The Commons also provides the needed “social and legal infrastructure,” and a portal that provides the right set of services.

They hope that by lowering research costs, some of the 5,000 tropical diseases currently “uneconomical to address,” for example, will become the target of pharmaceutical R&D. “Health Commons makes it cost effective for small groups of researchers to conduct industrial scale R&D on rare diseases by exploiting the economies of scale afforded by an ecosystem of shared knowledge…”

The authors see the benefits going beyond the Commons’ value to non-profits. “Every pharmaceutical company sits on a wealth of promising targets and leads that they won’t develop themselves.”

The Health Commons could be a huge step forward. But it will take some work. “To realize the full potential, existing companies need to rethink their business models to leverage the commons.” As an example, the paper points out that “Only six out of the 1800 biotechnology companies funded since 1980 have made more money than was cumulatively invested in them.” Rather than counting striking it rich with proprietary drugs discovered via proprietary R&D platforms, perhaps companies could profit by opening up their platforms and taking a cut of any drugs discovered with them.

Finally, Health Commons will provide a way to continuously publish research, along with comments, to supplement the traditional publishing model.

Health Commons can and should be a big deal. It requires lots of pieces coming together over time, but its acknowledgment of the role of profit is encouraging, and it is in the hands of serious, committed, and wickedly smart people. [Tags: health science science_commons health_commons pharma everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • health • knowledge • metadata • pharma • science Date: May 21st, 2008 dw

2 Comments »

« Previous Page | Next Page »