The Three Orders
The narrative that tells of the first man and woman encountering the tree
of knowledge focuses on its tempting fruit. But after we took the bite, we
apparently looked up and got the idea that knowledge is shaped like the tree's
branching structure: Big concepts contain smaller ones that contain smaller
ones yet. Over the millennia, we have fashioned the structures of knowledge
in just such tree-like ways, from the departmental organization of universities
(liberal arts contains history and history contains ancient Chinese history)
to the hierarchy of species. The idea that knowledge is shaped like a tree
is perhaps our oldest knowledge about knowledge.
Now autumn has come to the forest of knowledge, thanks to the digital revolution.
The leaves are falling and the trees are looking bare. We are discovering
that traditional knowledge hierarchies that have served us so well are unnecessarily
restricted when it comes to organizing information in the digital world. The
principles of organization themselves are changing now that they are being
freed from the constraints of the physical world. For example:
-
In the physical world, a fruit can hang from only one branch. In the
digital world, objects can easily be classified in dozens or even hundreds
of different categories.
-
In the real world, multiple people use any one tree. In
the digital world, there can be a different tree for each person.
-
In the real world, the person who owns the information generally also
owns and controls the tree that organizes that information. In the digital
world, users can control the organization of information owned by others.
(Exception to the rule: Westlaw owns the standard organization of case law
even though the case law itself is in the public domain.)
These differences are so substantial that we can think of intellectual order
as entering a third age. In the first, we organized the things themselves:
We put books on shelves and silverware into drawers. In the second, we physically
separated the metadata from the data: We built card catalogs and drew diagrams.
In the third, the data and the metadata are digital, untying organization
from the strictures of the physical world. In response, we are rapidly inventing
new principles and tools of organization. When it comes to innovation on the
Internet, metadata is becoming the new content.
But traditional taxonomic trees aren't something we can throw away without
a thought. They are an amazingly efficient way of organizing complexity because
they enable us to focus on one aspect (e.g., that's an apple) while keeping
a universe of context (it's a fruit, part of a plant, a type of living thing)
in the background, ready for access. Tree structures are built into our institutions.
They may even be built into our genes. So we are in a confusing and fertile
period as we try to sort out what works and what doesn't. Without trees, how
would we organize college curricula, business org charts, the local library,
and the order of species? How will we organize knowledge itself?
We may be on the path to finding out.
Webogeny recapitulates
ontogeny
The tree of knowledge has roots, of course. They go back to Aristotle, who
figured out how knowledge could be nested without having to claim that the
container (say, the concept of human-ness) is the same sort of thing as what
it contains (all existing humans). The individual items in a hierarchy inherit the properties of all the categories above it, so that if you know that Alcibiades
is a human, you also know that he is a mammal and an animal. Inheritance provides
a context by which the individual accretes the accumulated wisdom of the tree
just by hanging on a particular branch -- an amazingly efficient way of expressing
knowledge.
Five hundred years later the Syrian philosopher Porphyry first drew Aristotle's
system of nested concepts as a tree. That notion
stuck, implicitly endorsed by Carl Linnaeus and Charles Darwin in the sciences,
Francis Bacon in philosophy, and by libraries and academic departments just
about everywhere.
The next stop in this story is Postmodernism's insistence that trees of knowledge
are reflections of particular cultural assumptions and, importantly, conflate
knowledge and power. You can't read Michel Foucault's The Order of Things and believe that order itself has no history. And not just French philosophers
have given up on the old dream of finding a single, universal, comprehensive
way of organizing the world's knowledge. You can't come out of Geoffrey C. Bowker and Susan Leigh Star's study of the International
Classification of Diseases, Sorting Things Out, thinking that classification
systems are value-free and objectively true. Nor can you look at the US Census'
2000 decision to expand the number of possible races without seeing that taxonomies
can have enormous political and budgetary consequences.
The brief history of the Web has recapitulated Western culture's ontogeny
of trees. Yahoo!'s directory tree became the early center of the Web, each
leaf hand-selected and placed into categories designed initially by two computer
science grad students at Stanford. But text search engines — AltaVista, HotBot,
Google — dethroned Yahoo! as the Monarch of Search, and Yahoo! in turn has
moved its browsable tree below the fold on its home page.
When text search isn't the right solution — for example, at e-commerce sites
where people may not know the names of the products they're looking for —
a more dynamic way of creating and presenting trees, called faceted classification,
is coming into its own. Invented in the early 1930s by Shiyali
Ranganathan, an Indian librarian, it applies a pre-defined set of parameters
(or facets) to its objects. For example, watches might have facets such as
manufacturer, digital or analog, men's or women's, price, and electric or
spring-driven. Some facets are a set of possible values (such as a pick-list
of available manufacturers); others are a range of numerical values (such
as price range). Users can then browse by selecting first on, say, digital
or analog and then by price, or first by price and then by men's or women's.
Users can drill down as they do with a normal tree, but the arrangement of
the branches is dynamic and reflects the users' interests, not the store's.
The store may not like it that you've routed around the $25,000 Rolex they're
offering on sale for a mere $24,000, but you've found your $50, waterproof,
analog watch much faster.
Faceted classification still presents users with a hierarchical tree, making
it easy for them to browse to what they want. But unlike traditional trees,
faceted systems don't decide beforehand how the branches are arranged. For
example, if an ice cream stand organized its "customer experience" around
a traditional hierarchical taxonomy — a tree — it might have a customer first
choose between two flavors, then among three sizes, and finally between a
cup or cone. There are 12 potential paths and exactly one path to a large
cup of chocolate ice cream. In a faceted system, you could browse first by
flavor, size, or container, resulting in 36 potential paths and three ways
of getting to your large cup of chocolate. Faceted systems, like trees, enable
users to navigate by continually focusing their interests, but users get to
decide how their interests are structured. This makes faceted systems very
useful where there are lots of items with easily specifiable properties and
users whose ways of browsing are difficult to predict, such as a parts catalog.
The long tail of tags
Tags have become the meme of the year, at least so far, writing another chapter
in the history of classification systems. Tagging is an old idea, but it seems
to be taking off now because some applications provide end-users with immediate
benefits. For example, at del.icio.us, users enter bookmarks (URLs) they want
to remember, adding a word or two — tags — so they can sort them later. Del.icio.us
users can see not only everyone else's bookmarks, but also all the bookmarks
tagged with a particular word. For example, if you care about Emily Dickinson,
you can see all the Web pages del.icio.us users have tagged with "Dickinson"
or "Emily Dickinson," a great tool for researchers.
Traditionally, people have been loath to attach metadata to objects, because
it felt like a chore without immediate benefit. At del.icio.us and other sites
such as Flickr, a photo-sharing site, there is a strong social benefit to
tagging: We get to contribute to, and benefit from, the tagging done by others.
To lower the hurdle and encourage tagging, both sites allow us to type in
any word we want, rather than forcing us to navigate some hierarchical, controlled
vocabulary. Of course, that also makes it far harder to find relevant objects:
There's no immediate way to tell whether a photo tagged with "apple" shows
a fruit or a computer. Plus, a search for photos
tagged with "apple" will miss relevant photos tagged as "GrannySmith."
Tags are a break from previous ways of categorizing. Both trees and faceted
systems specify the categories, or facets, ahead of time. They both present
users with tree-like structures for navigation, letting us climb down branches
to get to the leaf we're looking for. Tagging instead creates piles of leaves
in the hope that someone will figure out ways of putting them to use — perhaps
by hanging them on trees, but perhaps creating other useful ways of sorting,
categorizing and arranging them.
Even in these early days of tagging, we're seeing self-organizing taxonomies
emerge from the piles. For example, if you're tagging a page about an Apple
computer, you may notice that far more people use the tag "Mac" than "Macintosh."
So, if you want lots of people to find the page, you will tag it "Mac." By
using that tag, you have also increased the popularity and momentum of the
"Mac" tag. The resulting bottom-up clusters of tags has been called a folksonomy.
(It's also been called a "tagsonomy," but that's
harder to differentiate from "taxonomy" when spoken aloud.)
Folksonomies stand in sharp contrast to both trees
and faceted systems. First, folksonomies tend to
be clusters of tags, not hierarchies: There's a pile of "apple" tags and another
pile of "GrannySmith" tags, but the folksonomy may
not recognize that the latter is a subset of the former. Hierarchies can sometimes
be derived from folksonomies, but they don't have
to be. Second, trees and faceted systems are designed ahead of time, usually
by information professionals. Folksonomies grow
organically. Third, trees and faceted systems are usually owned and controlled
by the people who own the information being organized, whereas folksonomies are (so far) unowned and not centrally controlled.
Fourth, trees and faceted systems drive out ambiguity. For example, take a
page that in a tagging system carries the ambiguous tag "apple." In a tree
or faceted system, the branch it hangs from would tell you whether the page
is about computers or fruit — inheritance at work. Tagging systems are inherently
ambiguous. Trees are neat; piles of leaves are messy.
Because of these differences, the three approaches are useful in different
circumstances:
Because they are unambiguous, trees work well where information
can be sharply delineated and is centrally controlled. Users are accustomed
to browsing trees, so little or no end-user training is required. But trees
are expensive to build and maintain and require the user to understand the
subject area well: How do you find the recipe for bread soup if you don't
know to look in the "Tuscan Cooking" category?
- Faceted systems work splendidly where an application is
being used by such a wide range of users that no one tree is going to match
everyone’s way of thinking. They are also easier to maintain than trees
because adding a new item requires only filling in the information about
the facets, rather than having to make a decision about exactly which category
it should go into.
- Tagging systems are possible only if people are motivated
to do more of the work themselves, for individual and/or social reasons.
They are necessarily sloppy systems, so if it's crucial to find each and
every object that has to do with, say, apples, tagging won't work. But for
an inexpensive, easy way of using the wisdom of the crowd to make resources
visible and sortable, there's nothing like tags.
The craft of creating and maintaining trees and faceted systems is well advanced
and well understood. Businesses have been built around them. But we don't
yet know the outcome of the current infatuation with tags. The potential is
real: If tag-mania continues, it will provide a layer of new metadata, generated
by humans for other humans, that will invoke innovation
and businesses — and problems — we necessarily cannot anticipate. |