Joho the Blog » libraries

December 31, 2010

Happy new year, libraries!

May 2011 be the best year for libraries in a couple of millennia!

So much is going on that it could be, you know. (And how often do you get to say that?) S

Follow me

Categories: libraries, too big to know Tagged with: libraries Date: December 31st, 2010 dw

Be the first to comment »

November 30, 2010

[bigdata] Ensuring Future Access to History

Brewster Kahle, Victoria Stodden, and Richard Cox are on a panel, chaired by the National Archive’s Director of Litigation Jason Baron. The conference is being put on by Princeton’s Center Internet for Technology Policy.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Brewster goes first. He’s going to talk about “public policy in the age of digital reproduction.” “We are in a jam,” he says, because of how we have viewed our world as our tech has change. Brewster founded the Internet Archive, a non-profit library. The aim is to make freely accessible everything ever published, from the Sumerian texts on. “Everyone everywhere ought to have access to it” — that’s a challenge worthy of our generation, he says.

He says the time is ripe for this. The Internet is becoming ubiquitous. If there aren’t laptops, there are Internet cafes. And there a mobiles. Plus, storage is getting cheaper and smaller. You can record “100 channel years” of HD TV in a petabyte for about $200,000, and store it in a small cabinet. For about $1,200, you could store all of the text in the Library of Congress. Google’s copy of the WWW is about a petabyte. The WayBack machine uses 3 petabytes, and has about 150 billion pages. It’s used by 1.5M/day. A small organization, like the Internet Archive, can take this task on.

This archive is dynamic, he says. The average Web page has 15 links. The average Web page changes every 100 days.

There are downsides to the archive. E.g., the WayBack Machine gets used to enable lawsuits. We don’t want people to pull out of the public sphere. “Get archived, go to jail,” is not a useful headline. Brewster says that they once got an FBI letter asking for info, which they successfully fought (via the EFF). The Archive gets lots of lawyer letters. They get about 50 requests per week to have material taken out of the Archive. Rarely do people ask for other people’s stuff to be taken down. Once, the Scientologists wanted some copyright-infringing material taken down from someone else’s archived site; the Archive finally agreed to this. The Archive held a conference and came up with Oakland Archive Policy for issues such as these.

Brewster points out that John Postel’s taxonomy is sticking: .com, .org, .gov, .edu, .mil … Perhaps we need separate policies for each of these, he says. And how do we take policy ideas and make them effective? E.g., if you put up a robots.txt exclusion, you will nevertheless get spidered by lots of people.

“We can build the Library of Alexandria,” he concludes, “but it might be problematic.”

Q: I’ve heard people say they don’t need to archive their sites because you will.
A: Please archive your own. More copies make us safe.

Q: What do you think about the Right to Oblivion movement that says that some types of content we want to self-destruct on some schedule, e.g. Facebook.
A: I have no idea. It’s really tough. Personal info is so damn useful. I wish we could keep our computers from being used against us in court; if we defined the 5th amendment so that who “we” are included our computers…

Richard Cox says if you gold, you know about info overload. It used to be that you had one choice of golf ball, Top-Flite. Now they have twenty varieties.

Archives are full of stories waiting to be told, he says. “When I think about Big Data…most archivists would think we’re talking about being science, corporate world, and government.” Most archivists work in small cultural, public institutions. Richard is going to talk about the shifting role of archivists.

As early as the 1940s, archivists were talking about machine-readable records. The debates and experiments have been going on for many decades. One early approach was to declare that electronic records were not archives, because the archives couldn’t deal with them. (Archivists and records managers have always been at odds, he says, because RM is about retention schedules, i.e., deleting records.) Over time, archivists came up to speed. By 2000, some were dealing with electronic records. In 2010, many do, but many do not. There is a continuing debate. Archivists have spent too long debating among themselves when they need to be talking with others. But, “archivists tend not to be outgoing folks.” (Archivists have had issues with the National Archives because their methods don’t “scale down.”)

There are many projects these days. E.g., we now have citizen archivists who maintain their own archives and who may contribute to public archives. Who are today’s archivists? Archival educators are redefining the role. Richard believes archives will continue, but the profession may not. He recommends reading the Clair report [I couldn’t get the name or the spelling, and can’t find it on Google :( ] on audio-visual archives. “I read it and I wept.” It says that we need people who understand the analog systems so that they can be preserved, but there’s no funding.

Victoria Stodden’s talk gloomy title is “The Coming Dark Ages in Scientific Knowledge.”

She begins by pointing to the pervasive use of computers and computational methods in the sciences, and even in the humanities and law schools. E.g., Northwestern is looking at the word counts in Shakespearean works. It’s changing the type of scientific analysis we’re doing. We can do very complicated simulations that give us a new way of understanding our world. E.g., we do simulations of math proofs, quite different from the traditional deductive processes.

This means what we’re doing as scientists is being stored in script, codes, data, etc. But science only is science when it’s communicated. If the data and scripts are not shared, the results are not reproducible. We need to act as scientists to make sure that this data etc. are shared. How do we communicate results based on enormous data sets? We have to give access to those data sets. And what happens when those data sets change (corrected or updated)? What happens to results based on the earlier sets? We need to preserve the prior versions of the data. How do we version it? How do we share it? How do we share it? E.g., There’s an experiment at NSF: All proposals have to include a data management plan. The funders and journals have a strong role to play here.

Sharing scientific knowledge is harder than it sounds, but is vital. E.g., a recent study showed that a cancer therapy will be particular effective based on individual genomes. But, it was extremely hard to trace back the data and code used to get this answer. Victoria notes that peer reviewers do not check the data and algorithms.

Why a dark age? Because “without reproducibility, knowledge cannot be recreated or understood.” we need ways and processes of sharing. Without this, we only have scientists making proclamations.

She gives some recommendations: (1) Assessment of the expense of data/code archiving. (2) Enforcement of funding agency guidelines. (3) Publication requirements. (4) Standards for scientific tools. (5) Versioning as a scientific principal. (6) Licensing to realign scientific intellectual property with longstanding scientific norms (Reproducible Research Standard). [verbatim from her slide] Victoria stresses the need to get past the hurdles copyright puts in the way.

Q: Are you a pessimist?
A: I’m an optimist. The scientific community is aware of these issues and is addressing them.

Q: Do we need an IRS for the peer review process?
A: Even just the possibility that someone could look at your code and data is enough to make scientists very aware of what they’re doing. I don’t advocate code checking as part of peer review because it takes too long. Instead, throw your paper out into the public while it’s still being reviewed and let other scientists have at it.

Q: [rick] Every age has lost more info than it has preserved. This is not a new problem. Every archivist from the beginning of time has had to cope with this.

Jason Baron of the National Archives (who is not speaking officially) points to the volume of data the National Archives (NARA) has to deal with. E.g., in 2001 32 million emails were transferred to NARA; in 2009, 250+ million archives were. He predicts there will be a billion presidential emails by 2017 held at NARA. The first lawsuit over email was filed in 1989 (email=PROFS). Right now, the official policy of 300 govt agencies is to print email out for archiving. We can no longer deal with the info flow with manual processes. Processing of printed pages occurs when there’s a lawsuit or a a FOIA request. Jason is pushing on the value of search as a way of encouraging systematic intake of digital records. He dreams of search algorithms that retrieve all relevant materials. There are clustering algorithms emerging within law that hold hope. He also wants to retrieve docs other than via key words. Visual analytics can help.

There are three languages we need: Legal, Records Management, and IT. How do we make the old ways work in the new? We need both new filtering techniques, but also traditional notions of appraisal. “The neutral archivist may serve as an unbiased resource for the filtering of information in an increasingly partisan (untrustworthy) world” [from the slide].

Follow me

Categories: libraries, science, too big to know Tagged with: 2b2k • archives • bigdata • libraries • science Date: November 30th, 2010 dw

3 Comments »

November 13, 2010

Doc around the clock, and around the world

Doc Searls has a brief post about wandering his way around the world and across the decades thanks to librarians, archivists, and the good folks of New Zealand.

Btw, be sure to click on the link to what Doc calls his “favorite family photo of all time.” OMG, he looks exactly the same.

Follow me

Categories: libraries, open access Tagged with: doc searls • libraries Date: November 13th, 2010 dw

Be the first to comment »

October 27, 2010

O failure, where is thy sting?

The Twitter hashtag #FailShare is accumulating instances of failed library projects, so that we can learn from them, and also, I imagine, to take the sting out of failure (on the grounds that sting-y failure makes for stingy ideas).

And, a brand new wiki page has gone up on the same topic.

(BTW, while on the topic of bad/good puns Frank Nugent once dismissed a disappointing play by the American playwright Clifford Odets with the line, “Odets, where is thy sting?”)

Follow me

Categories: libraries Tagged with: libraries • puns Date: October 27th, 2010 dw

Be the first to comment »

July 28, 2009

Annals of openness in peril

1. The court has rejected Charlie Nesson’s basic defense of Joel Tenenbaum’s sharing of music files. The case is going to jury which may levy the same sort of insanely excessive fines as in the Jammie Thomas-Rassert trial. I hope Charlie’s team can convince the jury that the fines and the entire process are so onerous and disproportionate that the RIAA has been abusing the court system. Of course, IANAL, and IANAOTJ (I am not on the jury).

2. Barnes and Noble has launched its e-book software. It runs on iPhones as well as on PC’s and Mac’s. I’m having trouble finding which formats it supports, but judging from its Open dialogue, not PDF, .doc, .html, .mobi, or text. It does support .PBD books.

After a very very quick session playing with it, it seems quite competitive with the Kindle, and because I’m running it on my Mac and not on the little piece of crippled hardware I bought from Amazon — the Kindle is just barely adequate as a reader, and is still overpriced by more than 100% in terms of its value, imo — having the use of a keyboard and a mouse is a big step up. And, unlike the Kindle, you can use whatever fonts you have on your machine. Still, it’s only incrementally better than the Kindle’s software (again, on a quick look), not a great leap forward for readers.

One of B&N’s big advantages is that it’s hooked into Google Books, enabling you to download public domain books that Google has scanned in. You do this by searching for a book on the B&N site and noticing the “free from Google Books” label. Be sure to sort by price; otherwise B&N lists the for-pay versions first. If B&N wants to be aggressive in this space (= succeed), it should create an easy-to-find section that lets you browse Google’s free books. Get us using the ereader and then sell us the copyrighted books. (If B&N has such a section, I couldn’t find it quickly enough.)

BTW, I presume (and thus may be wrong) that Google did a special deal with B&N to enable this. If so, I find it worrisome. If Google is going to be granted a special right to scan in books without fear of copyright reprisals, it will be the de facto national e-library, discouraging others from undertaking similarly scaled scanning projects, and thus should be making its public domain books equally and maximally freely available. IMO.

2a. [Later that evening:] B&N stores are now providing free Wifi. Yay!

3. Apple is not permitting the Google telephone service into the Apple App store, thus simultaneously and inadvertently making the case for Zittrainian generativity.

4. [Later that day]: On the happy front, Google has open-sourced an implementation of Wave.

[Tags: copyright copyleft books e-books google libraries everything_is_miscellaneous charles_nesson jonathan_zittrain law fair_use amazon kindle b&n ]

Follow me

Categories: Uncategorized Tagged with: amazon • books • cluetrain • copyleft • copyright • digital rights • e-books • everythingIsMiscellaneous • google • kindle • law • libraries • media Date: July 28th, 2009 dw

8 Comments »

July 26, 2009

The Guardian on miscellaneous bookshelves

The Guardian has fun article on schemes for arranging the books on your shelf, with an interesting set of comments. (It makes me want to send the entire thread a copy of Everything Is Miscellaneous.)

[Tags: everything_is_miscellaneous dewey the_guardian ]

Follow me

Categories: Uncategorized Tagged with: dewey • everythingIsMiscellaneous • everything_is_miscellaneous • libraries • taxonomy • the_guardian Date: July 26th, 2009 dw

Be the first to comment »

July 11, 2009

Reslicing publications

The OCLC has an experimental site up that provides classification information for books and pubs. You type in the book’s title and author (or ISBN number, or other such ID), and it returns info about the various editions and how they’re classified in the OCLC’s Dewey Decimal Classification System or by the Library of Congress. You can then see the other books that share its Dewey Decimal number (for example, here’s Everything Is Miscellaneous, #303.4833>>Social sciences>>Social sciences, sociology & anthropology>>Social processes), at the OCLC’s useful Dewey Browser. Alas, when you click on the Library of Congress number, you get taken to a demand by the LC that you subscribe to Classification Web, instead of to the free LC Catalog (where my Misc book is listed like this).

Lots of metadata about the metadata…Gotta love it!

[Tags: everything_is_miscellaneous dewey_decimal oclc libraries books metadata ]

Follow me

Categories: Uncategorized Tagged with: books • dewey_decimal • everythingIsMiscellaneous • everything_is_miscellaneous • libraries • metadata • oclc • taxonomy Date: July 11th, 2009 dw

4 Comments »

June 9, 2009

[berkman] Lewis Hyde on the Commons

Lewis Hyde is giving a Berkman talk about the book he’s working on. The book is about the ownership of art and ideas, and argues that they should lie in a cultual commons, rather than be treated as property.

Lewis begins by talk about what a commons is. The term comes from medieval property ideas, and Lewis thinks of commons as a kind of property. He asks the group for a definition of property. Suggestions from the audience: “Exclusive rights.” “Anything I can use and have some degree of control over, not necessarily exclusively.” Lewis says that a 1900 dictionary defines property as that over which one has “rights of action.” Property is a bundle of rights of action. Lewis likes this definition because it includes human actors, Blackstone defines property rights in maximalist terms: the right to exclude the entire universe. Scalia also thinks property is the right to exclude. Lewis thinks the right to exclude is one of the bundle, not the whole thing. This is because, he says, he’s interested in commons. (He notes that in medieval times, “common” could be used as a verb. E.g., “a man may commons in the forest.)

Lewis talks about Hardin’s “The Tragedy of the Commons” essay. In fact, traditionally commons had governance rules to prevent the destruction of the commons’ asset, including the right of exclusion. “Commons were in fact not tragic. They lasted for millennia in Europe. Not tragic because they were rule-governed and stinted.” Why has the phrase “The tragedy of the commons” persisted? In part, because the phrase is catchy. In part because Hardin proposed it during the Cold War and it was taken as showing that common-ism doesn’t work.

There used to be an annual ritual of “beating the bounds,” to keep any gradual encroachment on the commons. “These were convivial affairs.” Lewis wonders if there are ways we can recover this resistance to encroachment.

Applied to the cultural realm, Lewis thinks cultural products are by nature in a commons. In the 18th century you get the idea that we could own poems, novels, etc. Until then, people thought of property as applying only to land. If something is not excludable, there’s no property in it. Many argued in the 18th century that therefore artistic works can’t be property. (Lewis recommends Terry Fisher’s article on philosophies of property. Terry points to four : Labor, moral rights, commercial utilitarianism, and civic utilitarianism.)

The first copyright law was in 1710 (Statute of Anne). By giving authors and publishers rights, it removed the “in perpetuity” of the crown’s monopolistic grants. It also created the public domain by creating a clear limit on the term of ownership: After 14 years, it enters the public domain. It’s as if the commons is the default state, says Lewis.

Jamie Boyle talks about the “second enclosure” in which everything is copyrighted by default, the term is extended. The second enclosure is an enclosure of the mind, says Boyle. Lewis now thinks there might be a third enclosure: The enclosure of wilderness of the mind. Lewis agrees that it makes sense to let the creator of a work, say a novel, get rewarded for it. “I wrote it, so it’s mine.” But, asks Lewis, what does the “I” mean? What is the self? He cites a 12th century Buddhist: “We study the self to forget the self.” To forget the self is to wake up to the world around you. Creativity comes out of self-abnegation. To get to something truly new, you have to a door open to the unknown. We usually think that the outside of owned property is the public domain. But that’s a domesticated sphere, things we are familiar with. There’s a old tradition that during the period of maturation, you have to leave the known world, go away from where instruction is given, and become familiar with your ignorance. (Lewis says he’s drawing on Thoreau.)

He takes an example from Jonathan Zittrain. When the Apple II came out, there was a spurt in sales because the first spreadsheet emerged, something that had not been expected. If you want a generative Internet, you have to be careful about what you lock down. Another example: In the 1980s, San Diego cell biologists patented a sequence of amino acids. They didn’t know its biological purpose. Ten years later, other researchers think that that sequence blocks blood to tumors. The patent owners sued the researchers. The patent gums up the system. Exploratory science goes into the unknown. “To enclose wilderness means giving property rights in areas where we as yet have no understanding what’s happening.” Lewis adds: “This makes no sense.” Lewis would like us to restore the idea that there are things that are unowned.

Emblematic of the third enclosure is silence. John Cage in 1952 came to Harvard to see/hear a completely soundproofed room. But Cage could hear a low rumbling and high whining. The low rumbling is the sound of your blood and the high whining is the sound of your nervous system. Silence for Cage meant not no sound but non-intention. He composed “4 mins and 33 seconds” which is a stretch of silence. The audience hears the ambient noise. In 2002 a rock group called the Planets put in a minute of silence. As a joke/homage, they credited it to Cage. The royalty-collecting societies started to send checks to Cage’s publisher. The publisher sued for copyright infringement on moral rights grounds (i.e., misattribution). They settled. But Cage held a Buddhist-like view of artistic creation. He tried to remove the self. A lot of copyright law assumes the work contains the imprint of the author’s personality. That’s one of the reasons we give a copyright. But those laws can get in the way of our ability to live in the wilderness, i.e., the third enclosure. How do you become a creator in a world in which scientists can patent unknown sequences and silence can be copyrighted?

Q: Maybe part of the problem in defending the commons is that we say we’re defending freedom, not as in free beer. Fighting for free beer is more compelling than fighting for free speech.
A: Beating the bounds was a fun event. So, yes, people have to want to do this.

Q: [me] How do we counter the fairness argument: If I did it, I ought to get the reward. How do we respond to that?
A: It’s hard to do this in political debate because it’s a long argument. I raise the question of the “I”: To what extent is my contribution really from me? With cultural works, you’re working in a vast sea of existing material. What you create is not entirely yours. Even if it becomes popular and useful, it’s other people who made it so. You can also point to the utilitarian consequences: The public interest is advanced by enabling things to enter the public domain.

Q: [jason] You’re making a creativity defense, i.e., that the commons is generative. But, if we take Cage or Thoreau to heart and say that true creativity consists of transcending the self, could we say that that leads to saying all works should be owned, so that you’re forced to create something new?
A: The puzzle is how much you can actually go to the wilderness. You can face it, but there’s no way to escape the world you come out of. Thoreau has The Iliad with him. There’s no way to escape the known. You always work from materials you’ve collected elsewhere.

Q: [ethanz] What’s so bad about private property? You’re hearkening back to a romantic conception that worked for a very small set of people. We’ve got an enormous amount of development vased on increasingly strong enclosure movements. Those movements have given us a great deal of what we love. Despite the first and second enclosures, creativity seems not to have been much hindered. Why should we worry about the third enclosure? Couldn’t we say that you’re attempting to protect and defend something that most of us have not experienced? How do we know that your romantic vision is superior to the world we’re interacting with?
A: I’m not against private property. The question is always where the lines should be drawn. I think we’ve extended the right to exclude too far. Yes, the world is quite creative. But we don’t know what we’re missing. With the enclosing of wilderness, we’re enclosing that which we don’t know about. Researchers are reluctant to do certain kinds of work, for fear of being sued.
Ethan: My diabetes medicine — recombinant DNA — exists because Eli Lilly worked within enclosures. How do we know we would have made the same progress if those enclosures weren’t there?
A: Let’s leave that hanging as a question. It’s a good question. You’re right that the existing dominant system has produced remarkable results.

Q: Michael Heller in The Gridlock Economy goes through the economic models that explain what we lose by locking stuff down. What’s the cultural loss?
A: Lessig and others write books about this… [Tags: lewis_hyde copyright commons copyleft science art ]

Follow me

Categories: Uncategorized Tagged with: art • commons • copyleft • copyright • culture • digital culture • digital rights • everythingIsMiscellaneous • knowledge • libraries • science Date: June 9th, 2009 dw

4 Comments »

June 5, 2009

New open access blog

Stuart Shieber, one of Harvard’s Open Access ringleaders, has started a blog on that topic. He says it’ll be occasional â€” maybe per week, not per day â€” and it promises to be reflective and important to those who care about making more of the world’s research and knowledge available to, well, the world. (Stuart is the director of Harvard’s Office for Scholarly Communication, and was one of the important voices in the push for Harvard’s open access initiatives.)

[Tags: open_access stuart_shieber harvard everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: digital rights • education • everythingIsMiscellaneous • everything_is_miscellaneous • harvard • knowledge • libraries • open_access • stuart_shieber Date: June 5th, 2009 dw

Be the first to comment »

June 1, 2009

Law journal goes open access

The Columbia Science and Technology Law Review is going open access:

…we’ve refined our author agreement (already very liberal) to explicitly ensure that authors retain their copyrights, and we’re making our agreement public on our website. At the same time, we’re also embracing open publication, formally putting our articles under a Creative Commons Non-Commercial No-Derivatives license, and allowing our authors to distribute themselves under even more liberal licenses if they so choose.

Yay!

[Tags: open_access journals law_journals everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: digital rights • education • everythingIsMiscellaneous • everything_is_miscellaneous • journals • knowledge • law_journals • libraries • open_access Date: June 1st, 2009 dw

1 Comment »

« Previous Page | Next Page »