Joho the Blog » libraries

Everyday Chaos

Too Big to Know

Cluetrain 10th Anniversary

Everything Is Miscellaneous

Small Pieces Loosely Joined

Cluetrain Manifesto

Speaker info

Who am I? (Blog Disclosure Form)

Atom Feed

August 4, 2013

Paradata

Hanan Cohen points me to a blog post by a MLIS student at Haifa U., named Shir, in which she discourses on the term “paradata.” Shir cites Mark Sample who in 2011 posted a talk he had given at an academic conference, Mark notes the term’s original meaning:

In the social sciences, paradata refers to data about the data collection process itself—say the date or time of a survey, or other information about how a survey was conducted.

Mark intends to give it another meaning, without claiming to have worked it out fully. :

…paradata is metadata at a threshold, or paraphrasing Genette, data that exists in a zone between metadata and not metadata. At the same time, in many cases it’s data that’s so flawed, so imperfect that it actually tells us more than compliant, well-structured metadata does.

His example is We Feel Fine, a collection of tens of thousands (or more … I can’t open the site because Amtrak blocks access to what it intuits might be intensive multimedia) of sentences that begin “I feel” from many, many blogs. We Feel Fine then displays the stats in interesting visualizations. Mark writes:

…clicking the Age visualizations tells us that 1,223 (of the most recent 1,500) feelings have no age information attached to them. Similarly, the Location visualization draws attention to the large number of blog posts that lack any metadata regarding their location.

Unlike many other massive datamining projects, say, Google’s Ngram Viewer, We Feel Fine turns its missing metadata into a new source of information. In a kind of playful return of the repressed, the missing metadata is colorfully highlighted—it becomes paradata. The null set finds representation in We Feel Fine.

So, that’s one sense of paradata. But later Mark makes it clear (I think) that We Feel Fine presents paradata in a broader sense: it is sloppy in its data collection. It strips out HTML formatting, which can contain information about the intensity or quality of the statements of feeling the project records. It’s lazy in deciding which images from a target site it captures as relevant to the statement of feeling. Yet, Mark finds great value in We Feel Fine.

His first example, where the null set is itself metadata, seems unquestionably useful. It applies to any unbounded data set. For example, that no one chose answer A on a multiple choice test is not paradata, just as the fact that no one has checked out a particular item from a library is not paradata. But that no one used the word “maybe” in an essay test is paradata, as would be the fact that no one has checked out books in Aramaic and Klingon in one bundle. Getting a zero in a metadata category is not paradata; getting a null in a category that had not been anticipated is paradata. Paradata should therefore include which metadata categories are missing from a schema. E.g., that Dublin Core does not have a field devoted to reincarnation says something about the fact that it was not developed by Tibetans.

But I don’t think that’s at the heart of what Mark means by paradata. Rather, the appearance of the null set is just one benefit of considering paradata. Indeed, I think I’d call this “implicit metadata” or “derived metadata,” not “paradata.”

The fuller sense of paradata Mark suggests — “data that exists in a zone between metadata and not metadata” — is both useful and, as he cheerfully acknowleges, “a big mess.” It immediately raises questions about the differences between paradata and pseudodata: if We Feel Fine were being sloppy without intending to be, and if it were presenting its “findings” as rigorously refined data at, say, the biennial meeting of the Society for Textual Analysis, I don’t think Mark would be happy to call it paradata.

Mark concludes his talk by pointing at four positive characteristics of the We Feel Fine site:? It’s inviting, paradata, open, and juicy. (“Juicy” means that there’s lots going on and lots to engage you.) It seems to me that the site’s only an example of paradata because of the other three. If it were a jargon-filled, pompous site making claims to academic rigor, the paradata would be pseudodata.

This isn’t an objection or a criticism. In fact, it’s the opposite. Mark’s post, which is based on a talk that he gave at the Society for Textual Analysis, is a plea for research thatis inviting, open, juicy, and is willing to acknowledge that its ideas are unfinished. Mark’s post is, of course, paradata.

Follow me

Be the first to comment »

June 25, 2013

The kids are reading more than we thought

A fascinating new report from Pew Internet includes the following:

As with other age groups, younger Americans were significantly more likely to have read an e-book during 2012 than a year earlier. Among all those ages 16-29, 19% read an e-book during 2011, while 25% did so in 2012. At the same time, however, print reading among younger Americans has remained steady: When asked if they had read at least one print book in the past year, the same proportion (75%) of Americans under age 30 said they had both in 2011 and in 2012.

In fact, younger Americans under age 30 are now significantly more likely than older adults to have read a book in print in the past year (75% of all Americans ages 16-29 say this, compared with 64% of those ages 30 and older). And more than eight in ten (85%) older teens ages 16-17 read a print book in the past year, making them significantly more likely to have done so than any other age group.

Also:

…younger Americans have a broad understanding of what a library is and can be—??a place for accessing printed books as well as digital resources, that remains at its core a physical space.

Overall, most Americans under age 30 say it is “very important” for libraries to have librarians and books for borrowing;

Follow me

Categories: libraries Tagged with: books • libraries Date: June 25th, 2013 dw

1 Comment »

June 22, 2013

What I learned at LODLAM

On Wednesday and Thursday I went to the second LODLAM (linked open data for libraries, archives, and museums) unconference, in Montreal. I’d attended the first one in San Francisco two years ago, and this one was almost as exciting — “almost” because the first one had more of a new car smell to it. This is a sign of progress and by no means is a complaint. It’s a great conference.

But, because it was an unconference with up to eight simultaneous sessions, there was no possibility of any single human being getting a full overview. Instead, here are some overall impressions based upon my particular path through the event.

Serious progress is being made. E.g., Cornell announced it will be switching to a full LOD library implementation in the Fall. There are lots of great projects and initiatives already underway.
Some very competent tools have been developed for converting to LOD and for managing LOD implementations. The development of tools is obviously crucial.
There isn’t obvious agreement about the standard ways of doing most things. There’s innovation, re-invention, and lots of lively discussion.
Some of the most interesting and controversial discussions were about whether libraries are being too library-centric and not web-centric enough. I find this hugely complex and don’t pretend to understand all the issues. (Also, I find myself — perhaps unreasonably — flashing back to the Standards Wars in the late 1980s.) Anyway, the argument crystallized to some degree around BIBFRAME, the Library of Congress’ initiative to replace and surpass MARC. The criticism raised in a couple of sessions was that Bibframe (I find the all caps to be too shouty) represents how libraries think about data, and not how the Web thinks, so that if Bibframe gets the bib data right for libraries, Web apps may have trouble making sense of it. For example, Bibframe is creating its own vocabulary for talking about properties that other Web standards already have names for. The argument is that if you want Bibframe to make bib data widely available, it should use those other vocabularies (or, more precisely, namespaces). Kevin Ford, who leads the Bibframe initiative, responds that you can always map other vocabs onto Bibframe’s, and while Richard Wallis of OCLC is enthusiastic about the very webby Schema.org vocabulary for bib data, he believes that Bibframe definitely has a place in the ecosystem. Corey Harper and Debra Riley-Huff, on the other hand, gave strong voice to the cultural differences. (If you want to delve into the mapping question, explore the argument about whether Bibframe’s annotation framework maps to Open Annotation.)

I should add that although there were some strong disagreements about this at LODLAM, the participants seem to be genuinely respectful.

LOD remains really really hard. It is not a natural way of thinking about things. Of course, neither are old-fashioned database schemas, but schemas map better to a familiar forms-based view of the world: you fill in a form and you get a record. Linked data doesn’t even think in terms of records. Even with the new generation of tools, linked data is hard.
LOD is the future for library, archive, and museum data.

Here’s a list of brief video interviews I did at LODLAM:

Categories: everythingIsMiscellaneous, libraries Tagged with: everythingIsMiscellaneous • libraries • linked data • lodlam • metadata • standards Date: June 22nd, 2013 dw

Be the first to comment »

June 21, 2013

[lodlam] Kevin Ford on the state of BIBFRAME

Kevin Ford who is a principle member of the team behind the Library of Congress’ BIBFRAME effort — a modern replacement for the aging MARC standard — gives an update on its status, and addresses a controversy about whether it’s “webby” enough. (I liveblogged a session about this at LODLAM.)

Follow me

Categories: interop, libraries, podcast Tagged with: bibframe • libraries • linked data • lodlam • marc • podcast Date: June 21st, 2013 dw

3 Comments »

[lodlam] Kitio Fofack on why Linked Data

Kitio Fofack turned to Linked Data when creating a prototype app that aggregated researcher events. He explains why.

Follow me

Categories: interop, libraries, podcast Tagged with: libraries • linked data • lodlam • podcast Date: June 21st, 2013 dw

Be the first to comment »

Debra Riley-Huff on library data from a Webby point of view

Debra Riley-Huff [twitter: huff] explains what some of the library metadata standards (including BIBFRAME and Schema.org) look like from the point of view of a Web developer.

Follow me

Categories: everythingIsMiscellaneous, libraries, podcast Tagged with: everythingIsMiscellaneous • podcast Date: June 21st, 2013 dw

Be the first to comment »

June 20, 2013

[lodlam] Richard Wallis on Schema.org

Richard Wallis [twitter: rjw] of OCLC explains the appeal of Schema.org for libraries, and its place in the ecosystem.

Follow me

Categories: everythingIsMiscellaneous, libraries, podcast Tagged with: bibframe • libraries • lodlam • metadata • schema.org Date: June 20th, 2013 dw

1 Comment »

[lodlam] Richard Urban on LOD patterns

At the LODLAM conference, Richard Urban suggests that we build a pattern library so that people can identify common problems and common linked data solutions.

Follow me

Categories: libraries, podcast Tagged with: libraries • linked data • lodlam • podcast Date: June 20th, 2013 dw

Be the first to comment »

[lodlam] Corey Harper on designing LOD with users in mind

I videoed the opening of a session (liveblogged here) at LODLAM about trying to get past thinking about Linked Data as a way of stitching together resources, and instead trying to address user needs. Corey Harper led the session. Here are his opening remarks, recorded with his permission but in very low lighting that makes it look furtive.

Follow me

Categories: libraries, podcast Tagged with: libraries • linked data • lodlam Date: June 20th, 2013 dw

Be the first to comment »

[lodlam] Topics for Day 2

Here are the sessions people are proposing for the second day of the LODLAM conference in Montreal:

Getty Vocabulary goes open

Linked data on mobiles, wearable devices

Do cool things with the data sets that you have on your laptop – let’s build stuff!

Your tools and solutions

NLP for linked open data for libraries, archives, and museums. Data extraction, taxonomy alignment, context extraction, etc.

World War I in LOD

LOD and accessibility & assistive devices

The Pundit software package

the KARMA mapping tool

Tools and techniques for generating concordances between people

Why Schema.org?

Copying and synching linked data

FRBR and other standards [couldn’t hear]

How to create a new generation of LOD professionals. Getting students involved in projects.

The future of LODLAM

Normalizing ata models and licensing models

The official list is here.

Follow me

Categories: libraries, liveblog Tagged with: linked data • liveblog • lodlam Date: June 20th, 2013 dw

Be the first to comment »

« Previous Page | Next Page »