logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

June 18, 2011

An argument for loosening copyright

Culture does not exist simply to enlighten us.

Culture’s far more common role is to give us something to talk about.

If we have nothing to talk about, nations divide over unreasonable differences, communities reduce to parking regulations, and marriages end in dinnertime squabbles.

To talk about things in a depth that binds requires freely accessing, citing, quoting, pointing, and linking.

Therefore, for the sake of our nation, communities, and marriages, we need to loosen copyright’s hold.

QED

Tweet
Follow me

Categories: copyright, open access Tagged with: copyright Date: June 18th, 2011 dw

10 Comments »

June 14, 2011

Linked Open Data take-aways

I just wrote up an informal trip report in the form of “take aways” from the LOD-LAM conference I attended a cople of weeks ago. Here is a lightly edited version.

 


Because it was an unconference, it was too participatory to enable us to take systematic notes. I did, however, interview a number of attendees, and have posted the videos on the Library Innovation Lab blog site. I actually have a few more yet to post. In addition, during the course of one of the sessions (on “Explaining LOD-LAM”), a few of us began constructing a FAQ.

Here’s some of what I took away from the conference.

– There is considerable momentum around linked open data, starting with the sciences where there is particular research value in compiling huge data sets. Many libraries are joining in.

– LOD for libraries will enable a very fluid aggregation of information from multiple types of sources around any particular object. E.g., a page about a Hogarth illustration (or about Hogarth, or about 18th century London, etc.) could quite easily aggregate information from any data set that knows something about that illustration or about topics linked to that illustration. This information could be used to build a page or to do research.

– Making data and metadata available as LOD enables maximal re-use by others.

– Doing so requires expertise, but should be less massively difficult than supporting many other standards.

– For the foreseeable future, this will be something libraries do in addition to supporting more traditional data standards; it will be an additional expense and effort.

– Although there is continuing debate about exactly which license to use when publishing library data sets, it seems that usually putting any form of license on the data other than a public domain waiver of licenses is likely to be (a) futile and (b) so difficult to deal with that it will inhibit re-use of the data, depriving it of value. (See the 4-star license proposal that came out of this conference.)

– The key point of resistance against LOD among libraries, archives and museums is the propecia online justified fear that once the data is released into the world, the curating institutions can no longer ensure that the metadata about an object is correct; the users of LOD might pick up a false attribution, inaccurate description, etc. This is a genuine risk, since LOD permits irresponsible use of data. The risk can be mitigated but not removed.

Tweet
Follow me

Categories: copyright, culture, everythingIsMiscellaneous, libraries, open access, too big to know Tagged with: 2b2k • archives • everythingIsMiscellaneous • libraries • lod • lod-lam • metadata • museums • open access Date: June 14th, 2011 dw

2 Comments »

June 10, 2011

[hyperpublic] Final panel: Cooperation without Coercion

At the final panel of the conference. Judith Donath is moderating.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Charlie Nesson asks: “When we talk about our space, who are we?” In Jeff Huang‘s presentation, it seemed like he was given the perfect hypothetical — a desert — to build a public and private place. “In cyber terms, we are people of the Net. What then is our domain? It’s the public domain. And if you are to build the public domain, then I believe the wisdom to follow from a lawyer’s point of view is the same wisdom that has more or less informed the world of real property. If you want an orderly world of real property, you build a registry. If you want an orderly world of bits, you build a registry.” This is Charlie’s new project: a registry of the public domain. They’re starting with IMSLP.org: a musical score library. It has 93,000 musical scores in the public domain., exquisitely put together.

The Net divides into two domains, says Charlie, one that is free and one that is not. Free means free of copyright and other encumbrances. Charlie wants to build our domain on a foundation solid in law. The registry he’s building identifies works as public domain, with links to the registrars attesting to this. He wants it to be populated by librarians with public domain collections. But, the problem with registries is litigation risk, i.e., the threat of lawsuit. “So the essence of this idea is to couple the registrar with a pro bono commitment of legal service from a law firm of repute to defend litigation based on infringement.”

Where do you find the institutions that want to protect privacy, asks Charlie. How about libraries, he suggests?

“I’m tough on privacy, Judith,” says Charlie, in response to a question. “I’ve never liked it.” He explains it’s so often based on fear and looks backwards.

Martin Nowak looks at cooperation evolutionary term in which a donor pays a cost and a recipient gets a benefit. He explains game theory’s Prisoner’s Dilemma. Why do people cooperate? “Natural selection chooses defection,” rather than cooperation. In a mixed population, defection becomes increasingly more popular. So, natural selection needs help to favor co-operation. Martin categorizes the factors into five mechanisms: kin selection, direct reciprocity, indirect reciprocity spatial selection and group selection.

Direct reciprocity (I help you, you help me). If you play the Prisoner’s Dilemma several times, the economics changes, as The Folk Theorem shows them. Martin quickly summarizes Axelrod and Rapaport. [Too hard to live blog. Read Ethanz. Really. Now.] Errors turn out to ruin cooperation, so you need a process that allows for forgiveness. Martin’s doctoral dissertation showed that if everyone plays randomly, the right tactic is to always defect. A tit for tat strategy corrects that, and generous tit-for-tat (I may still cooperate even if you defect) provides a math model for the evolution of forgiveness and cooperation. There are always oscillations; cooperations are never stable. We need structures that rebuild cooperation quickly after it is destroyed because it always will be destroyed.

Direct reciprocity allows allow for the evolution of cooperation if there’s a prospect of another round. Indirect reciprocity (I help you, someone helps me) leads to cooperation if reputation matters. You need natural selection to care about reputation, so to speak. “What you need for indirect reciprocity is gossip” to spread reputation. For that you need language. “You could argue this is the selection process that led to language.” “For direct reciprocity you need a face. For indirect reciprocity you need a name.” (David Haig) Our brain has both capabilities. If interactions are completely anonymous you run into problems. Also, you need gossip to be relatively honest.

Spatial selection = neighbors help each other. Martin flips through some graphs that shows that it selects for coop if you have a few close friends. Likewise, evolutionary set theory says that people wanting to join particular groups can also lead to coop.

Judith: What about strong vs. weak ties?
Martin: We assume equal ties. There’s a trade-off between wealth and vulnerability.

Nicholas Negroponte asks himself a question every morning: Is he doing something that normal market forces would do anyway? If so, he stops. He wants to do that which market forces will not do.

There are now 3M One Laptop Per Child laptops in the hands of kids. This isn’t huge since OLPC would like to get laptops into the hands of about 500M kids. Before that, people assumed computers teach by imparting content. Instead, you want to see children teaching. 20-30% of the million Peruvian kids with OLPC machines are using them to teach their parents how to read.

Nicholas goes through some points he made in a talk at the UN recently. Among the points: Measurement is overrated. You only measure when the changes are so small that you can only see them by measurement.

Judith: When we see well-off kids sitting side by side looking into screens, we think it’s a nightmare of anti-sociality, but when we see your adorable photos of third world kids in the same position, it looks desirable?
Nicholas: I don’t see the well-off kids that way. And why don’t we make OLPC’s available in the US? Because the issues are deeper than that.

A: Talk about anonymity…?
Jeff Jarvis: It’s foundational to democracy. It’s getting a bad name because of trolls. But it must be protected.

Q: This discussion is soaked in privilege. There’s much inscribed in the language that affects how people act. When you idolize the public space as a place where all can share their ideas safely, it feels really far away for me.

Q: (Charlie) Nicholas, you’ve said that Uruguay has given all 500,000 of its kids OLPCs. Given your position on measurement, what change will we see?
Nicholas: Their curiosity, the way they approach problems, the way they look at things…I think you’re going to see a nation that is far more creative than many other nations. Nicholas tells a story of kid whose homework got 100K hits.
Martin: Who teaches them how to use it?
Nicholas: It’s genetic :) We’re going to do a scientific experiment in which we drop OLPC laptops out of helicopters onto remote villages and come back in a year and see how many have learned how to read.

Q: (urs gasser) One vision says build a great tool and see what happens. The other is to study human behavior scientifically. (Nicholas vs. Martin). How difficult is the translation from findings from science about human behavior to adapting them to technology?
Martin: I’m fascinated by mathematics, but we do apply it to practical issues. In the field of cooperation, we’d like to bring the models closer to human observations. For example, many cultures like punishment, but I think it doesn’t work well to create cooperation because it creates complications. Reward seems better. So, we study that. We do the same experiment in multiple cultures. In Romania, for example, people differentiated between public and private outcomes, because they lacked faith that public engagement had positive outcomes.

Q: (zeynep) The Net has let the cooperative side of human nature be more manifest. Does your work in evolutionary biology take account of this?
A: The coop we see in the animal world must rely on direct observation. Humans can communicate. We don’t have to rely on our personal experience with another to decide whether to coop. The Net can help us to evaluate others quickly.

Tweet
Follow me

Categories: copyright, culture, education, liveblog, science Tagged with: commons • cooperation • evolution • hyperpublic • olpc • prisoner's dilemma • public domain Date: June 10th, 2011 dw

2 Comments »

June 8, 2011

MacKenzie Smith on open licenses for metadata

MacKenzie Smith of MIT and Creative Commons talks about the new 4-star rating system for open licenses for metadata from cultural institutions:

The draft is up on the LOD-LAM site.

Here are some comments on the system from open access guru Peter Suber.

Tweet
Follow me

Categories: copyright, culture, open access Tagged with: archives • copyright • libraries • lod-lam • lodlam • metadata • museums • open access • peter suber Date: June 8th, 2011 dw

5 Comments »

June 3, 2011

Happy birthday, Larry!

One of the heroes of the Internet turns 50, and the people who love him (and there are a lot of us) thank him on this perfectly appropriate video:

Larry Lessig 50th Birthday Lip Sync Tribute from Daniel Jones on Vimeo.

Tweet
Follow me

Categories: copyright, culture Tagged with: birthday • free culture • larry lessig • lessig • mashups Date: June 3rd, 2011 dw

Be the first to comment »

May 27, 2011

A Declaration of Metadata Openness

Discovery, the metadata ecology for UK education and research, invites stakeholders to join us in adopting a set of principles to enhance the impact of our knowledge resources for the furtherance of scholarship and innovation…

What follows are a set of principles that are hard to disagree with.

Tweet
Follow me

Categories: copyright, education, egov, everythingIsMiscellaneous, open access, too big to know Tagged with: 2b2k • everythingIsMiscellaneous • metadata • open access Date: May 27th, 2011 dw

1 Comment »

May 10, 2011

[berkman] Culturomics: Quantitatve analysis of culture using millions of digitized books

Erez Lieberman Aiden and Jean-Baptiste Michel (both of Harvard, currently visiting faculty at Google) are giving a Berkman lunchtime talk about “culturomics“: the quantitative analysis of culture, in this case using the Google Books corpus of text.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

The traditional library behavior is to read a few books very carefully, they say. That’s fine, but you’ll never get through the library way. Or you could read all the books, very, very not carefully. That’s what they’re doing, with interesting results. For example, it seems that irregular verbs become regular over time. E.g., “shrank” will become “shrinked.” They can track these changes. They followed 177 irregular verbs, and found that 98 are still irregular. They built a table, looking at how rare the words are. “Regularization follows a simple trend: If a verb is 100 times less frequent, it regularizes 10 times as fast.” Plus you can make nice pictures of it:


Usage is indicated by font size, so that it’s harder for the more used words to get through to the regularized side.


The Google Books corpus of digitized text provides a practical way to be awesome. Erez and Jean-Baptiste got permission from Google to trawl through that corpus. (It is not public because of the fear of copyright lawsuits.) They produced the n-gram browser. They constructed a table of phrases, 2B lines long.


129M books have been published. 18M have been scanned. They’ve analysed 5M of them, creating a table with 2 billions rows. (In some cases, the metadata wasn’t good enough. In others, the scan wasn’t good enough.)

They show some examples of the evolution of phrases, e.g. thrived vs. throve. As a control, they looked at 43 Heads of State and found that the year they took power usage of “head of state” zoomed (which confirmed that the n-gram tool was working).


They like irregular verbs in part because they work out well with the ngram viewer, and because there was an existing question about the correlation of irregular and high-frequency verbs. (It’d be harder to track the use of, say, tables. [Too bad! I’d be interested in that as a way of watching the development of the concept of information.]) Also, irregular verbs manifest a rule.


They talk about chode’s change to chided in just 200 yrs. The US is the leading exporter of irregular verbs: burnt and learnt have become regular faster than others, leading the British’s usage.


They also measure some vague ideas. For example, no one talked about 1950 until the late 1940s, and it really spiked in 1950. We talked about 1950 a lot more than we did, say, 1910. The fall-off rate indicates that “we lose interest in the past faster and faster in each passing year.” They can also measure how quickly inventions enter culture; that’s speeding up over time.


“How to get famous?” They looked at the 50 most famous people born in 1871, including Orville Wright, Ernest Rutherford, Marcel Proust. As soon as these names passed the initial threshhold (getting mentioned in the corpus as frequently as the least-used words in the dictionary) their mentions rise quickly, and then slowly goes down. The class of 1871 got famous at age 34; their fame doubled every four years; they peaked at 73, and then mentions go down. The class of 1921’s rise was faster, and they became famous before they became 30. If you want to become famous fast, you should become an actor (because they become famous in the mid to late 20s), or wait until your mid 30s and become a writer. Writers don’t peak as quickly. The best way to become famous is to become a politician, although have to wait until you’re 50+. You should not become an artist, physicist, chemist or mathematicians.


They show the frequency charts for Marc Chagall, US vs. German. His German fame dipped to nothing during the Nazi regime who suppressed him because he was a Jew. Likewise with Jesse Owens. Likewise with Russian and Chinese dissidents. Likewise for the Hollywood Ten during the Red Scare of the 1950s. [All of this of course equates fame with mentions in books.] They show how Elia Kazan and Albert Maltz’s fame took different paths after Kazan testified to a House committee investigating “Reds” and Maltz did not.


They took the Nazi blacklists (people whose works should be pulled out of libraries, etc.) and watched how they affected the mentions of people on them. Of course they went down during the Nazi years. But the names of Nazis went up 500%. (Philosophy and religion was suppressed 76%, the most of all.)


This led Erez and Jean-Baptiste to think that they ought to be able to detect suppression without knowing about it beforehand. E.g., Henri Matisse was suppressed during WWII.


They posted theirngrams viewer for public access. From the viewer you can see the actual scanned text. “This is the front end for a digital library.” They’re working with the Harvard Library [not our group!] on this. In the first day, over a million queries were run against it. They are giving “ngrammies” for the best queries: best vs. beft (due to a character recognition error); fortnight; think outside the box vs. incentivize vs. strategize; argh vs aargh vs argh vs aaaargh. [They quickly go through some other fun word analyses, but I can’t keep up.]


“Cultoromics is the application of high throughput data collection and analysis to the study of culture.” Books are just the start. As more gets digitized, there will be more we can do. “We don’t have to wait for the copyright laws to change before we can use them.”


Q: Can you predict culture?
A: You should be able to make some sorts of predictions, but you have to be careful.


Q: Any examples of historians getting something wrong? [I think I missed the import of this]
A: Not much.


Q: Can you test the prediction ability with the presidential campaigns starting up.
A: Interesting.


Q: How about voice data? Music?
A: We’ve thought about it. It’d be a problem for copyright: if you transcribe a score, you have a copyright on it. This loads up the field with claimants. Also, it’s harder to detect single-note errors than single-letter errors.


Q: Do you have metadata to differentiate fiction from nonfiction, and genres?
A: Google has this metadata, but it comes from many providers and is full of conflicts. The ngram corpus is unclean. But the Harvard metadata is clean and we’re working with them.


Q: What are the IP implications?
A: There are many books Google cannot make available except through the ngram viewer. This gives digitizers a reason to digitize works they might otherwise leave alone.


Q: In China people use code words to talk about banned topics. This suppresses trending.
A: And that takes away some of the incentive to talk about it. It cuts off the feedback loop.


Q: [me] Is the corpus marked up with structural info that you can analyze against, e.g., subheadings, captions, tables, quotations?
A: We could but it’s a very hard problem. [Apparently the corpus is not marked up with this data already.]

Q: Might you be able to go from words to metatags: if you have cairo, sphinx, and egypt, you can induce “egypt.” This could have an effect on censorship since you can talk about someone without using her/his name.
A: The suppression of names may not be the complete suppression of mentions, yes. And, yes, that’s an important direction for us.

Tweet
Follow me

Categories: berkman, copyright, too big to know Tagged with: 2b2k • berkman • google • irregular verbs • library Date: May 10th, 2011 dw

2 Comments »

May 4, 2011

Open Access soars

Some facts and stats, compiled at PoeticEconomics:

# of open access journals : over 6,000. Growth rate: 4 per day.

# of freely available journals: over 28,000. Growth rate: 10 per day.

# of open access repositories: close to 2,000 . Growth rate: 1 per day.

# of documents freely available: 25 million. Growth rate: 6,000 per day.

# of open access mandate policies: 271. Growth rate: 1 per week or 5 per month.

% of world’s scholarly literature that is freely available: 20%

The sources are here.

Tweet
Follow me

Categories: copyright, open access Tagged with: copyright • oa • open access Date: May 4th, 2011 dw

1 Comment »

April 20, 2011

Google’s copyright cartoon

Google’s educational copyright cartoon is amusing in a Ren and Stimpy sort of way

But it’s disturbing that the cartoon purposefully makes the Fair Use “explanation” unintelligible. Presumably that’s because Fair Use is so complex and so difficult to defend that Google doesn’t even want to raise it as a possibility. Nevertheless, it seems like a missed opportunity to do some education. Worse, it’s a sign that we’ve pretty much given up on Fair Use.

Likewise, many of us were disappointed when Google Books dropped its Fair Use defense and instead came up with a settlement (since overturned) with the authors and publishers. It was another lost opportunity to provide Fair Use with some clarity and oomph.

Fair Use doesn’t need just a posse (Lord bless it). It could use a bigtime hero with some guts.

Tweet
Follow me

Categories: copyright Tagged with: copyright • fair use • google • google books Date: April 20th, 2011 dw

4 Comments »

March 26, 2011

Doing Google Books right

Having written in opposition to the Google Books Settlement (1 2 3), I was pleased with Judge Chin’s decision overall. The GBS (which, a couple of generations ago would have unambiguously referred to George Bernard Shaw) was worked out by Google, the publishers, and the Authors Guild without schools, libraries, or readers at the table. The problems with it were legion, although over time it had gotten somewhat less obnoxious.


Yet, I find myself slightly disappointed. We so desperately need what Google was building, even though it shouldn’t have been Google (or any single private company) that is building it. In particular, the GBS offered a way forward on the “orphaned works” problem: works that are still in copyright but the owners of the copyright can’t be found and often are probably long dead. So, you come across some obscure 1932 piece of music that hasn’t been recorded since 1933. You can’t find the person who wrote it because, let’s face it, his bone sack has been mouldering since Milton Berle got his own TV show, and the publishers of the score went out of business before FDR started the Lend-Lease program. You want to include 10 seconds of it in your YouTube ode to the silk worm. You can’t because some dead guy and his defunct company can’t be exhumed to nod permission. Multiply this times millions, and you’ve got an orphaned works problem that has locked up millions of books and songs in a way that only a teensy dose of common sense could undo. The GBS applied that common sense — royalties would be escrowed for some period in case the rights owner staggered forth from the grave to claim them.. Of course the GBS then divvied up the unclaimed profits in non-common-sensical ways. But at least it broke the log jam.


Now it seems it’ll be up to Congress to address the orphaned works problem. But given Congress’ maniacal death-grip on copyright, it seems unlikely that common sense will have any effect and our culture will continue to be locked up for seventy years beyond the grave in order to protect the 0.0001 percent of publishers’ catalogs that continue to sell after fourteen years. (All numbers entirely made up for your reading pleasure.)


As Bob Darnton points out, this is one of the issues that a Digital Public Library of America could address.

 


James Grimmelmann has an excellent and thorough explanation of the settlement, and a prediction for its future.

Tweet
Follow me

Categories: copyright, libraries Tagged with: copyleft • copyright • dpla • gbs • google books • libraries Date: March 26th, 2011 dw

7 Comments »

« Previous Page | Next Page »


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!