Joho the Blog » everythingIsMiscellaneous

December 9, 2011

CBC interview with me about library stuff

The CBC has posted the full, unedited interview with me (15 mins) that Nora Young did last week. We talk about the Harvard Library Lab’s two big projects, ShelfLife and LibraryCloud. (At the end, we talk a little about Too Big To Know.) The edited interview will be on the Spark program.

Follow me

1 Comment »

November 29, 2011

[2b2k] Curation without trucks

If users of a physical library could see the thousands of ghost trucks containing all the works that the library didn’t buy backing away from the library’s loading dock, the idea of a library would seem much less plausible. Rather than seeming like a treasure trove, it would look like a relatively arbitrary reduction.

It’s not that users or librarians think there is some perfect set (although it wasn’t so long ago that picking a shelf’s worth of The Great Books seemed not only possible but laudable). Everyone is pragmatic about this. Users understand that libraries make decisions based on a mix of supporting popular tastes and educating to preferred tastes: The Iliad is going to survive being culled even though it has far fewer annual check-outs than The Girl with the Dragon Tattoo. Curating is a practical art and libraries are good at it. But curating into a single collection that happens to fit within a library-sized building increasingly looks like a response to the weaknesses of material goods, rather than as an appropriate appreciation of their cultural value. Curation has always meant identifying the exceptions, but with the new assumption of abundance, curators look for exceptions to be excluded, rather than to be included. In the Age of the Net, we’re coming to believe that just about everything deserves to be in the library for one reason or another.

It seems to me there are two challenges here. The first is redeploying the skills of curators within a hyper-abundant world that supports multiple curations without cullings. That seems to me eminently possible and valuable. The second is cultivating tastes when there are so many more paths of least cognitive and aesthetic resistance. And that is a far more difficult, even implausible, challenge.

That is, our technology makes it easy to have multiple curations equally available, but our culture wants (has wanted?) some particular curations to have priority. Unless trucks are physically removing the works outside the preferred collection, how we are going to enforce our cultural preferences?

The easy solution is to give up on the attempt. The Old White Man’s canon is dead, and good riddance. But you don’t have to love old white men to believe that culture requires education — despite what Nikolas Sarkozy believes, we don’t “naturally” love complex works of art without knowing anything about their history or context — and that education requires taking some harder paths, rather than always preferring the easier, more familiar roads. I won’t argue further for this because it’s a long discussion and I have nothing to say that you haven’t already thought. So, for the moment take it as an hypothesis.

This I think makes clear what one of the roles of the DPLA (Digital Public Library of America) should be.

Ed Summers has warned that the DPLA needs to be different from the Web. If it is simply an index of what is already available, then it has not done its job. It seems to me that even if it curates a collection of available materials it has not done its job. It is not enough to curate. It is not even enough to curate in a webby way that enables users to participate in the process. Rather, it needs to be (imo) a loosely curated assemblage that is rich in helping us not only to find what is of value, but to appreciate the value of what we find. It can do that in the traditional ways — including items in the collection, including them in special lists, providing elucidations and appreciations of the items — as well as in non-traditional, crowd-sourced, hyperlinked ways. The DPLA needs to be rich and ever richer in such tools. The curated works should become ever more embedded into a network of knowledge and appreciation.

So, yes, part of the DPLA should be that it is a huge curated collection of collections. But curation now only has reliable value if it can bring us to appreciate why those curatorial decisions were made. Otherwise, it can seem as if we’re simply looking at that which the trucks left behind.

Follow me

Categories: everythingIsMiscellaneous, libraries, too big to know Tagged with: 2b3k • curation • dpla • libraries Date: November 29th, 2011 dw

4 Comments »

November 22, 2011

Physical libraries in a digital world

I’m at the final meeting of a Harvard course on the future of libraries, led by John Palfrey and Jeffrey Schnapp. They have three guests in to talk about physical library space.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

David Lamberth lays out an idea as a provocation. He begins by pointing out that until the beginning of the 20th century, a library was not a place but only a collection of books. He gives a quick history of Harvard Library. After the library burned down in 1764, the libraries lived in fear of fire, until electric lights came in. The replacement library (Gore Hall) was built out of stone because brick structures need wood on the inside. But stone structures are dank, and many books had to be re-bound every 30 years. Once it filled up, 25-30 of Harvard libraries derived from the search for fireproof buildings, which helps explain the large distribution of libraries across campus. They also developed more than 40 different classification systems. At the beginning of the 20th C, Harvard’s collection was just over one million. Now it adds up to around 18M. [David’s presentation was not choppy, the way this paraphrase is.]

In the 1980s, there was continuing debate about what to do about the need for space. The big issue was open or closed stacks. The faculty wanted the books on site so they could be browsed. But stack space is expensive and you tend to outgrow it faster than you think. So, it was decided not to build any more stack space. There already was an offsite repository (New England Book Depository), but it was decided to build a high density storage facility to remove the non-active parts of the collection to a cheaper, off-site space: The Harvard Depository (HD).

Now more than 40% of the physical collections are at HD. The Faculty of Arts and Sciences started out hostile to the idea, but “soon became converted.” The notion faculty had of browsing the shelves was based on a fantasy: Harvard had never had all the books on a subject on a shelf in a single facility. E.g., search on “Shakespeare” in the Harvard library system: 18,000 hits. Widener Library is where you’d expect to find Shakespeare books. But 8,000 of the volumes aren’t in Widener. Of Widener’s 10K Shakespeare, volumes, 4,500 are in HD. So, 25% of what you meant to browse is there. “Shelf browsing is a waste of time” if you’re trying to do thorough research. It’s a little better in the smaller libraries, but the future is not in shelf browsing. Open and closed stacks isn’t the question any more. “It’s just not possible any longer to do shelf browsing, unless we develop tools for browsing in a non-physical fashion.” E.g., catalog browsers, and ShelfLife (with StackView).

There’s nobody in the stacks any more. “It’s like the zombies have come and cleared people out.” People have new alternatives, and new habits. “But we have real challenges making sure they do as thorough research as possible, and that we leverage our collection.” About 12M of the 18M items are barcoded.

A task force saw that within 40 years, over 70% of the physical collection will be off site. HD was not designed to hold the part of the collection most people want to use. So, what can do that will give us pedagogical and intellectual benefit, and realizes the incredible resource that our collection is?

Let me present one idea, says David. The Library Task Force said emphatically that Harvard’s collection should be seen as one collection. It makes sense intellectually and financially. But that idea is in contention with the 56 physical libraries at Harvard. Also, most of our collection doesn’t circulate. Only some of it is digitally browsable, and some of that won’t change for a long long long time. E.g., our Arabic journals in Widener aren’t indexed, don’t publish cumulative indexes, and are very hard to index. Thus scholars need to be able to pull them off the shelves. Likewise for big collections of manuscripts that haven’t even been sorted yet.

One idea would be to say: Let’s treat physical libraries as one place as well. Think of them as contiguous, even though they’re not. What if bar-coded books stayed in the library you returned to them to? Not shelved by a taxonomy. Random access via the digital, and it tells you where the work is. And build perfect shelves for the works that need to be physically organized. Let’s build perfect Shakespeare shelves. Put them in one building. The other less-used works will be findable, but not browsable. This would require investing in better findability systems, but it would let us get past the arbitrariness of classification systems. Already David will usually go to Amazon to decide if he wants a book rather than take the 5 mins to walk to the library. By focusing on perfect shelves for what is most important to be browsable, resources would be freed up. This might make more space in the physical libraries, so “we could think about what the people in those buildings want to be doing,” so people would come in because there’s more going on. (David notes that this model will not go over well with many of his colleagues.)

53% of library space at Harvard is stack space. The other 47% is split between patron space and space staff. About 20-25% is space staff. Comparatively, Harvard is lower on patron space size than typical. The HD is holding half the collection in 20% of the space. It’s 4x as expensive to store a work on a stack on campus than off.

David responds to a question: The perfect shelves should be dynamic, not permanent. That will better serve the evolution of research. There are independent variables: Classification and shelf location. We certainly need classification, but it may not need to map to shelf locations. Widener has bibliographic lists and shelf lists. Barcodes give us more freedom; we don’t have to constantly return works to fixed locations.

Mike Barker: Students already build their own perfect shelves with carrels.

Q: What’s the case for ownership and retention if we’re only addressing temporal faculty needs?

A lot of the collecting in the first half of the 20 C was driven by faculty requests. Not now. The question of retention and purchase splits on the basis of how uncommon the piece of info is. If it’s being sold by Amazon, I don’t think it really matters if we retain it, because of the number of copies and the archival steps already in place. The more rare the work, the more we should think about purchase and retention. But under a third of the stack space on campus ideal environmental conditions. We shouldn’t put works we buy into those circumstances unless they’re being used.

Q: At the Law Library, we’re trying to spread it out so that not everyone is buying the same stuff. E.g., we buy Peruvian materials because other libraries aren’t. And many law books are not available digitally, so we we buy them … but we only buy one copy.

Yes, you’re making an assessment. In the Divinity library, Mike looked at the duplication rate. It was 53%. That is, 53% of our works are duplicated in other Harvard libraries.

Mike: How much do we spend on classification? To create call numbers? We annually spend about 1.5-2M on it, plus another million shelving it. So, $3M-3.5M total. (Mike warns that this is a “very squishy” number.) We circulate about 700,000 items a years. The total operating budget of the Library is about $152M. (He derived this number by asking catalogers who long it takes to classify an item without one, divided into salary.)

David: Scanning in tables of contents, indexes, etc., lets people find things without having to anticipate what they’re going to be interested in.

Q: Where does serendipity fall in this? What about when you don’t know what you’re looking for?

David: I agree completely. My dissertation depended on a book that no one had checked out since 1910. I found it on the stacks. But it’s not on the shelves now. Suppose I could ask a research librarian to bring me two shelves worth of stuff because I’m beginning to explore some area.

Q: What you’re suggesting won’t work so well for students. How would not having stacks affect students?

David: I’m being provocative but concrete. The status quo is not delivering what we think it does, and it hasn’t for the past three decades.

Q: [jeff goldenson] Public librarians tell us that the recently returned trucks are the most interesting place to go. We don’t really have the ability to see what’s moving in the Harvard system. Yes, there are privacy concerns, but just showing what books have been returned would be great.

Q: [palfrey] How much does the rise of the digital affect this idea? Also, you’ve said that the storage cost of a digital object may be more than that of physical objects. How does that affect this idea?

David: Copyright law is the big If. It’s not going away. But what kind of access do you have to digital objects that you own? That’s a huge variable. I’ve premised much of what I’ve said on the working notion that we will continue to build physical collections. We don’t know how much it will cost to keep a physical object for a long time. And computer scientists all say that digital objects are not durable. My working notion here is that the parts that are really crucial are the metadata pieces, which are more easily re-buildable if you have the physical objects. We’re not going to buy physical objects for all the digital items, so the selection principle goes back to how grey or black the items are. It depends on whether we get past the engineering question about digital durability — which depends a lot on electromagnetism as a storage medium, which may be a flash in the pan. We’re moving incrementally.

Q: [me] If we can identify the high value works that go on perfect shelves, why not just skip the physical shelves and increase the amount of metadata so that people can browse them looking for the sort of info they get from going to the physical shelf?

A: David: Money. We can’t spend too much on the present at the expense of the next century or two. There’s a threshold where you’d say that it’s worth digitizing them to the degree you’d need to replace physical inspection entirely. It’s a considered judgment, which we make, for example, when we decide to digitize exhibitions. You’d want to look at the opportunity costs.

David suggests that maybe the Divinity library (he’s in the Phil Dept.) should remove some stacks to make space for in-stack work and discussion areas. (He stresses that he’s just thinking out loud.)

Matthew Sheehy, who runs HD, says they’re thinking about how to keep books 500 years. They spend $300K/year on electricity to create the right environment. They’ve invested in redundancy. But, the walls of the HD will only last 100 years. [Nov. 25: I may have gotten the following wrong:] He thinks it costs about $1/ year to store a book, not the usual figure of $0.45.

Jeffrey Schnapp: We’re building a library test kitchen. We’re interested in building physical shelves that have digital lives as well.

[Nov. 25: Changed Philosophy school to Divinity, in order to make it correct. Switched the remark about the cost of physical vs. digital in the interest of truth.]

Follow me

Categories: everythingIsMiscellaneous, libraries, taxonomy, too big to know Tagged with: 2b2k • everythingIsMiscellaneous • libraries • shelflife Date: November 22nd, 2011 dw

4 Comments »

October 11, 2011

Classifying folktales

Via Metafilter:

The Aarne-Thompson Classification System

Originally published by Finnish forkloristAntti Aarne and expanded by American Stith Thompson and German Hans-Jörg Uther, the Aarne-Thompson Classification System is a system for classifying folktales based on motifs.

Some Examples:
Beauty and the Beast: Type 425C
Bluebeard: 312
The Devil Building a Bridge: Type 1191
The Foolish Use of Magic Wishes Type 750A
Hansel and Gretel and other abandoned children: Type 327
Women forced to marry hogs: Type 441
The Runaway Pancake: Type 2025
Wikipedia has a complete breakdown and here has examples of most of the tale types.

Follow me

Categories: everythingIsMiscellaneous Tagged with: classification • everythingIsMiscellaneous Date: October 11th, 2011 dw

3 Comments »

October 4, 2011

ShelfLife and LibraryCloud: What we did all summer

We’re really really really pleased that the Digital Public Library of America has chosen two of our projects to be considered (at an Oct. 21 open plenary meeting) for implementation as part of the DPLA’s beta sprint. The Harvard Library Innovation Lab (Annie Cain, Paul Deschner, Jeff Goldenson, Matt Phillips, and Andy Silva), which I co-direct (along with Kim Dulin) worked insanely hard all summer to turn our prototypes for Harvard into services suitable for a national public library. I have to say I’m very proud of what our team accomplished, and below is a link that will let you try out what we came up with.

Upon the announcement of the beta sprint in May, we partnered up with folks at thirteen other institutions…an amazing group of people. Our small team at Harvard , with generous internal support, built ShelfLife and LibraryCloud on top of the integrated catalogs of five libraries, public and university, with a combined count of almost 15 million items, plus circulation data. We also pulled in some choice items from the Web, including metadata about every TED talk, open courseware, and Wikipedia pages about books. (Finding all or even most of the Wikipedia pages about books required real ingenuity on the part of our team, and was a fun project that we’re in the process of writing up.)

The metadata about those items goes into LibraryCloud, which collects and openly publishes that metadata via APIs and as linked open data. We’re proposing LibraryCloud to DPLA as a metadata server for the data DPLA collects, so that people can write library analytics programs, integrate library item information into other sites and apps, build recommendation and navigation systems, etc. We see this as an important way what libraries know can become fully a part of the Web ecosystem.

ShelfLife is one of those possible recommendation and navigation systems. It is based on a few basic principles hypotheses:

– The DPLA should be not only a service but a place where people can not only read/view items, but can engage with other users.

– Library items do not exist on their own, but are always part of various webs. It’s helpful to be able to switch webs and contexts with minimal disruption.

– The behavior of the users of a collection of items can be a good guide to those items; we think of this as “community relevance,” and calculate it as “shelfRank.”

– The system should be easy to use but enable users to drill down or pop back up easily.

– Libraries are social systems. Library items are social objects. A library navigation system should be social as well.

Apparently the DPLA agreed enough to select ShelfLife and LibraryCloud along with five other projects out of 38 submitted proposals. The other five projects — along with another three in a “lightning round” (where the stakes are doubled and anything can happen??) — are very strong contenders and in some cases quite amazing. It seems clear to our team that there are synergies among them that we hope and assume the DPLA also recognizes. In any case, we’re honored to be in this group, and look forward to collaborating no matter what the outcome.

You can try the prototype of ShelfLife and LibraryCloud here. Keep in mind please that this is live code running on top of a database of 15M items in real time, and that it is a prototype (and in certain noted areas merely a demo or sketch). I urge you to talk the tour first; there’s a lot in these two projects that you’ll miss if you don’t.

Follow me

Categories: education, everythingIsMiscellaneous, libraries, taxonomy, too big to know Tagged with: 2b2k • dpla • everytningismis • libraries • metadata Date: October 4th, 2011 dw

3 Comments »

September 27, 2011

Libraries of the future

We’ve just posted the latest Library Innovation Lab podcast, this one with Karen Coyle who is a leading expert in Linked Open Data. Will we have perpetual but interoperable disagreements about how to classify and categorize works and decide what is the “same” work?

And, if you care about libraries and are in the Cambridge (MA) area on Oct. 4, there’s a kick off event at Sanders Theater at Harvard for a year of conversations about the future of libraries. Sounds great, although I unfortunately will be out of town :(

Follow me

Categories: everythingIsMiscellaneous, libraries Tagged with: 2b2k • everythingIsMiscellaneous • libraries • metadata Date: September 27th, 2011 dw

1 Comment »

September 9, 2011

[2b2k] Difference matters

I still don’t know why I started getting a free subscription to Game Developer magazine, but I sure enjoy it. The technical articles are over my head and frequently completely over my head, but I enjoy reading articles written from a hard-core developer point of view. (The magazine comes to me under the name Johnny Locust at Wild West Ware — not a pseudonym or anynym of mine. I find traces of him on the Net, but none that lets me contact him directly. Johnny, if you find this, I’m enjoying your subscription!)

The magazine opener this month (Sept.) comes from Eric Caoili. It”s about The Difference Engine Initiative, an incubator to encourage and enable women as game developers. Two sessions are planned in Toronto.

One of the founders, Mare Sheppard, says in Game Developer:

“There’s this huge, homogenous, very insular, established set of developers right now in the game industry, and it happens to be mostly white and mostly male. From that, you can really only get a certain amount of innovation…If we had more voices and more opinions and more people coming in, then we would be able to take bigger steps in releasing games that represent different people, because they’re involved in the development process.”

As for the incubator, says Sheppard, “It’s like a crafter’s circle. It’s loose and low-key, and it’s about peer mentorship.” She sees it as just one step that might help some people get over the initial hurdle.

The project is named after Ada Lovelace’s contribution to Babbage’s Difference Engine, but I enjoy the implicit endorsement of difference as a source of innovation. In fact, difference is the source of all value, isn’t it?

Follow me

Categories: culture, everythingIsMiscellaneous, games, too big to know Tagged with: 2b2k • games • women Date: September 9th, 2011 dw

2 Comments »

August 24, 2011

Google Books contract with the British Library

Thanks to the persistence of Javier Ruiz of the British Open Rights Group, you can now read [pdf] the contract between the British Library and Google Books. Google has shrouded its book digitization contracts in non-disclosures wrapped in lead sheathing that is then buried in collapsed portions of the Wieliczka salt mines. It took a Freedom of Information Act request by Javier to get access, and Google restricts further re-distribution.

Javier points out that the contract is non-exclusive, although the cost of re-digitizing is a barrier. Also, while the contract allows non-commercial research into the scanned corpus, Google gets to decide which research to allow. “There is also a welcome clause explicitly allowing for metadata to be included in the Europeana database,” Javier reports.

Follow me

Categories: everythingIsMiscellaneous, libraries, open access Tagged with: british library • google • google books • libraries • open access • org Date: August 24th, 2011 dw

2 Comments »

August 23, 2011

The unframed Net

It’s clear that we don’t know how to explain the Internet. Is it a medium? Is it a culture, a subworld, or a parallel world? Is it a communication system? We bounce around, and we disagree.

Nevertheless, I am not as worried about our lacking the right framing for the Net as are some of my friends and colleagues.

For one thing, the same refusal to be pinned down characterizes everything. What something _is_ depends on what we’re trying to do with it, even within a culturally/linguistically homogeneous group. You can try this exercise with anything from terrorism to television to candy bars. (To pin myself down about why I think we can’t pin things down: I am sort of a phenomenological pragmatist. I also think that everything is miscellaneous, but that’s just me.)

So, we assimilate the Internet to existing concepts. There is nothing slovenly or cowardly about this. It’s how we understand things.

So, why does the Net seem special to us? Why does it seem to bust our frames ‘n’ paradigms? After all, we could assimilate the Net into older paradigms, because it is a series of tubes, and it is a communications medium, and it is a way of delivering content. Not only could we assimilate it, there are tremendous pressures to do so.

But for pragmatic (and Pragmatic) reasons, some of us (me included) don’t want to let that happen. It would foreclose cultural and political consequences we yearn for — the “we” that has flocked to the Net and that loves it for what it is and could be. The Net busts frames because it serves our purposes to have it do so.

This is why I find myself continuing to push Internet Exceptionalism, even though it does at times make me look foolish. Internet Exceptionalism is not an irrational exuberance. It is a political position. More exactly, it is a political yearning.

That’s why I’m not much bothered by the fact that we don’t have a new frame for the Net: frames are always inadequate, and the frame-busting nature of the Net serves our purposes.

In that sense, the way to frame the Internet is to keep insisting that the Net does not fit well into the old frame. Those of us who love the Net need to keep hammering on the fact that the old frames are inadequate, that the Net is exceptional, not yet assimilated to understanding, still to be invented, open to possibility, liberating of human and social potential, a framework for hope.

Eventually we’ll have the new frame for the Internet. It will be, I will boldly predict, the Internet :) In fact, open networks already are the new frame, and are sweeping aside old ways of thinking. Everything is a network.

The Internet will transition quickly from un-frameable to becoming the new frame. Until then, we should (imo) embrace the un-frameability of the Net as its framing.

Follow me

Categories: everythingIsMiscellaneous, taxonomy Tagged with: everything is miscellaneousw • frames • internet • pragmatism Date: August 23rd, 2011 dw

11 Comments »

July 12, 2011

Google author markup

I’m trying out Google Authorship, a pilot project that identifies online works with the author’s Google Profile.”To identify the author of an article, Google checks for a connection between the content page (such as an article), an author page, and a Google Profile.” It seems like a good, straightforward idea.

Setting it up entails using “rel” tags to mark content as yours, and to point to a page that will serve as your home page as an author. Google wants your Google Profile page to be the authenticating hub. (I continue to insist that Google should have called Google Profile “Whoogle.”) The link can point to your G Profile but can also point to one of your own pages.

I’m only slightly conflicted about this service. It puts Google Profile further at the center of the identity ecosystem. (You must have a Google Profile to use the service.) On the other hand, the “rel” links can point to your own pages. On a third hand, it helps Google search users find your posts and disambiguates authorship, both of which I’m in favor of. So, I’m trying it. (Search for my name or for the title of a blog post to see it in action.)

Here’s a Google blog post about it.

Follow me

Categories: everythingIsMiscellaneous, misc Tagged with: google • identity Date: July 12th, 2011 dw

4 Comments »

« Previous Page | Next Page »