June 20, 2013
[lodlam] Richard Wallis on Schema.org
Richard Wallis [twitter: rjw] of OCLC explains the appeal of Schema.org for libraries, and its place in the ecosystem.
Date: June 20th, 2013 dw
June 20, 2013
Richard Wallis [twitter: rjw] of OCLC explains the appeal of Schema.org for libraries, and its place in the ecosystem.
At the LODLAM conference, Richard Urban suggests that we build a pattern library so that people can identify common problems and common linked data solutions.
I videoed the opening of a session (liveblogged here) at LODLAM about trying to get past thinking about Linked Data as a way of stitching together resources, and instead trying to address user needs. Corey Harper led the session. Here are his opening remarks, recorded with his permission but in very low lighting that makes it look furtive.
June 19, 2013
Dean Krafft of Cornell talks about the status of VIVO, an interdisciplinary tool to help researchers discover one another.
This is from the LODLAM conference in Montreal.
Kevin Ford from the Library of Congress is talking about BIBFRAME, which he describes as a replacement for MARC and a rethinking of the entire ecosystem.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people. |
(If a response isn’t labeled “Kevin,” then it wasn’t Kevin. Also, this is much compressed, incomplete, and choppy. Also, I haven’t re-read it.)
Q: From the Bibframe mailing list it seems like there isn’t agreement about what Bibframe is trying to achieve.
Kevin: Sometimes people see it narrowly.
Q: It’s not clear how Bibframes gets to where it replaces MARC.
Kevin: We’re not holding back some plan or roadmap that we’ve mapped out perfectly with milestones and target dates. We’re taking it as it comes.
Q: There’s a perception on the part of vendors and customers of vendors that this is a new data specification that vendors will have to support, and that that’s its main function, and possibly that’s pushing the knowledge representation in a direction that’s favorable to the vendors — a direction that’s too simple.
Q: Is there an agreement about the end point?
Kevin: There’s agreement that it needs to do what MARC does but better. We’re doing data representation, not predicting the systems built on top of it.
Q: What are the functional requirements that Bibframe’s trying to meet with this new model? What are your metrics? And who are you trying to satisfy?
Kevin: It’s not vendor focused. We hope systems will be built that expose the data as linked data.
Q: Bibframe let’ you associate a record with a particular work, which is a huge advance.
Q: Bibframe used to talk about roundtripping from MARC to Bibframe to MARC. But Bibframe is now adding info, so I don’t see how roundtripping is possible.
Kevin: Not losslessly.
Q: Bibframe is intended for libraries, but from what I’ve seen it doesn’t seem that Bibframe is intended for use outside of libraries. There doesn’t seem to be any thought about how other ontologies might be overlaid. And that was a problem with MARC: it was too library-centric. Why not investigate mapping it into other vocabularies?
Kevin: Nothing stops you from including other namespaces. As for mapping to other vocabularies, we’re working on a 40 year time scale and can’t know that other vocabularies will be around.
Q: We need some community-building to make that happen. We need to be careful not to build an ontological silo.
Q: The naming of this data set is unfortunate: Why” bib”, which has a connotation of books, when really it should be about any kind of information-bearing object. Why not call it “InfoFrame”? Who uses “bibliographic” other than libraries? Why limit yourself?
Kevin: I cannot begin to tell you how much time was spent on what this thing should be called. It went through a couple of different names. It’s not an ideal name, but I hope that the “bib” association falls by the wayside.
Q: The library ecosystem includes articles, licenses, and many other things that weren’t part of MARC. Is Bibframe aiming at representing all of that?
Kevin: Yes, it’s in scope. Certainly data about journal articles.
Kevin: Yes, Bibframe lets you define your own fields, as in MARC.
Q: We’re going from cataloging to catalinking: from records about resources to links related to topics, etc.
A: We need services that will link resources to other resources. Bibframe doesn’t do that, but it’s more amenable to it than MARC.
Kevin: [Sorry, but I missed the beginning of this.] When it comes to subject headings, we expect you to resolve that URI. If people are doing that every single time, then it’s a candidate for being included. That lookup could be a query into your local system. I’ve assumed you’ll have to have a local copy of it.
Q: Versioning? Why did you ignore the work of the British Library?
Kevin: We didn’t ignore it at all. We need to attend to what’s achievable by the smallest institutions as well as the largest.
Q: For a small institution, is it practical to move away from MARC?
Kevin: Not for some. Some still use card catalogs. I expect some of the first systems will be an outward layer around legacy systems.
Q: We need a larger discussion about provenance and about trust on the semantic web. Libraries should be better participants in that discussion; it’s a deeply important space for us.
Q: This conversation makes me cynical about our profession’s involvement. We need be talking with users. We need community involvement. We’re worried about the longevity of FOAF? It’ll outlast Bibframe because people actually use it. Let’s keep turning inward until we’re completely irrelevant.
Q: Yeah, the idea that there has to be one namespace seems so counter to the principles of linked data.
Q: Do we have anyone outside of the library community here?
A: I’m mainly a web developer. There’s a really big gulf. The Web will win when it comes to how libraries operate. Whether Bibframe will be a part of it remains to be seen. In the web community, everything seems exciting, but I feel so much angst in the library community.
Corey Harper [twitter:chrpr] starts a session by giving a terrific presentation of the problem: Linked data discussions and apps have focused too much on resources instead of on topics, narratives, etc. — what users are using resources to explore. We are not extracting all the value from librarians’ controlled vocabulary.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people. |
Some notes from the open discussion. Very sketchy, much choppier than in life, and highly incomplete.
Why not use Solr, i.e., an indexer of SQL databases? In part because Solr doesn’t know enough about the context, so a search for “silver” comes back with all sorts of hits without recognizing that some refer to the mineral, some to geo places with “silver” in the name, etc. E.g., if you say “john constable artist birthdate,” linked data can get you the answer. [I typed that into Google. It came back with the answer in big letters.]
Linked data can do the sort of thing that reference librarians do: Here’s what you’re looking for, but have you also seen this and this and that?
How do we evaluate the user interfaces we come up with? How do we know if it’s helped someone find something, put something into context, tell a story…?
We have two weird paradigms in the library community: Lucene-based indexes of metadata (e.g., Blacklight) vs. exhibit makers (e.g., Omeka). How to bring those together so exhibits are made through an index, and the flow through them is itself indexed and made findable and re-usable. (And then there’s the walking through a room and discovering relationships among things.)
How do we preserve the value of the subject classifications? [Here’s one idea: Stacklife :) ]
It’s important to keep one of the core functions of catalog: to identify and create identities for resources. A lot of our examples are facts, but in the Humanities what’s our role in maintaining identities around which we can hang relationships and maintain the disagreements among people. How do you help people navigate that problem space?
The Web’s taught us that the only way to find things is through search, but let’s remember the “link” in “linked data”: the ability to find the relationship between things you’ve found. E.g., the Google Knowledge Graph and Google fact panel are doing this to some degree. We’ve lost that, thanks to computers.
People want to have debates and find conflicting information. It’s hard how to bring this into a search interface.
The Digital Mellini project digitized a specialized manuscript and opened up. Once something is digitized, there are pieces you cannot see with the human eye — e.g., marginal notes.
Other examples of the sort of thing that Corey is talking about:
Linking Lives. EACCPF (corporations persons and families).
SNACs [??] (“Facebook for dead people”) mines finding aids to find social relationships.
LinkSailor (RIP) traversed a many OWL sameAs relationships.
CultureSampo (Finnish)
Tim Sherratt‘s group has something coming out soon
People think that museum web sites are boring. At LODLAM we’re a bunch of data geeks and are the wrong people to be talking about user interfaces. Response: We should take the Apple route and give people what they don’t know they want. We should also be testing our models against how people think about the world.
“I have a lot of data. It’s very sparse and sometimes very concentrated. It’s hard to know what users want from it. I don’t know what’s going to be important to you. So we generate video games, using geodata to create the playing field.” That’s not a retrieval engine, but it’s a way to make use of the factoids.
Read “The Lean Startup.” The Minimum Viable Product is an important idea. Don’t underrate the role of the product owner in shaping a great project. (Me:) Having strong, usable, graphs that take advantage of what libraries know would be helpful.
Who are our clients? Users? Scholars? Developers? A: All of them. Response: Then we’ll fail. Response: Catalogs were designed to manage collections, not for the general public. People have been forced to learn how to use them; you have to understand the collection’s abstraction. And that’s not sustainable.
Our library wants to build the graph. We build simple interfaces to demonstrate the power, but our value is in building the graph.
We don’t want to deliver linked data to users. We want to build the layer between the linked data and the apps. If we do it well, users won’t know or care that there’s linked data underneath it.
We tend to focus on what we think our users should want. It’s an “eat your broccoli” approach to search. E.g., users want social networks, but many scholars resist it because it seems too non-rigorous.
Jon Voss, an organizer of the LODLAM conference in Montreal, talks about what we can learn about the current state of Linked Data for libraries, archives, and museums by looking at the topics proposed at this unconference:
KARMA from University of Southern California takes tools for a wide variety of sources and maps it to your ontologies and generates linked data. It is open source and free. [I have not even re-read this post. Running to the next session.]
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people. |
They are demo-ing using a folder full of OWL ontology files. [OWL files contain the rules that define ontologies. KARMA runs in your browser. The mapping format is R2RML, which is designed for relational databases, but they’ve extended it to handle more types of databases. You can import from a database, files, or a service. For the demo, they’re using CSV files from a Smithsonian database that consists of display names, IDs represented unique people, and a variant or married name. They want to map it to the Europeana ontology. KARMA shows the imported CSV and lets you (for example) create a URI for every person’s name in the table. You can use Python to transform the variant names into a standard name ontology, e.g. transforming “married name” into aac-ont:married (American Art Consortium), You can model the data and it learns it. E.g., it asks if you want to map the original’s ConstituentID to saam-ont:constituentID or saam-ont:objectId. (It recognizes that the ID is all numerals.) There’s an advanced option that lets you mp it to, for example, a URI for aac-ont:Person1.
He clicks on the “display name” and KARMA suggests that it’s a SKOS altLabel, or a FOAF name, etc. If there are no useful suggestions, you can pick one that’s close and then edit it. You can browse the ontologies in the folders you’ve configured it to load. You can have synonyms (“a FOAF person can be a SKOS person.”) [There’s yet more functionality, but this where I topped out.]
You can save this as a process that can be run in batch mode.
I’m at LODLAM (linked open data for libraries, archives, and museums) in Montreal. It’s an unconference with 100 people from 16 countries. Here are the topics being suggested at the opening session. (There will be more added to the agenda board.)
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people. |
(Because this is an unconference, I probably will not be doing much more liveblogging.)
Taxonomy alignment
How to build a case for LOD
How to build a pattern library (a clear articulation for a problem, the context where the problem appears, and a pattern for its solution) for cultural linked open data
How to take PDF to the next level, integrating triples to make it open data? How to make it into a “portable data format”
How can we efficiently convert our data to LOD? USC has Karma and would like to convene a workshop about tools.
How to convert simple data to LOD? How to engage users in making that data better?
A cultural heritage standard.
User interfaces. What do we do after we create all of this data? [applause]
Progress since the prior LODLAM (in San Francisco)? BIBFRAME? Schema.org?
Preserving linked data
The NSA has built the ultimate linked data tool chain. What can we learn?
Internal use cases for linked data.
How to make use of dirty metadata
A draft ontology for MODS metadata (MODSRDF)
Collaborating on a harvesting/enrichment tool
Getty Vocabulary is being released as LOD [applause], but they need help building a community making sure they have the right ontologies, early adopters, etc.
The data exhaust from dSPACE and linking it to world problems — find the disconnects between the people who have problems and people with info helpful for those problems
Identities and authorities — linked data as an app-independent way of doing identity control and management
RDF cataloging interface
Curation and social relationships
Linked Open Data echo systems
A new understanding of search — ways LODers search isn’t familiar to most people
Open Annotation tools enabling end users to enrich the graph
Our collections are different for a reason. That manifests itself in the data structure. We should talk about this.
In the business writ large, maybe we need the confidence to be invisible. What does that mean?
Feedback loops once data has been exposed
Wikidata — the database that supports Wikipedia
Forming an international group to discuss archival data, particularly in LOD
June 13, 2013
Both Facebook and Apple have announced the use of tags. Yay!
Tags have continued to percolate through the ecosystem after their most auspicious introduction in Delicious.com. (Note the phrase “most auspicious”; tags have always been with us.) It’s great to see them increase both because they are a great way to get use out of the craziness while preserving it in its original form for others, and because there is great value in scaling tags, as Flickr has shown.
So, yay for tags. And yay for the crazy.