Thursday, January 19, 2012

BHL and Linked Data at ALA Midwinter

At the next-to-last minute I was invited on a panel at the American Library Association Midwinter meeting this weekend in Dallas, TX. Awesome, gives me an opportunity to talk about BHL's experience assigning DOIs to legacy literature, and I want to demonstrate CrossRef's Linked Data integration for DOIs:

Press Release:
http://www.crossref.org/crweblog/2011/04/crossref_and_international_doi.html

Tech info:
http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html

The DOI I'm demo'ing is associated with "The amoebae living in man; a zoological monograph" because that's a great title!
It's online at BHL at: http://www.biodiversitylibrary.org/bibliography/10172
And its DOI is: 10.5962/bhl.title.10172

I've used Disco & Marbles and am getting nowhere fast.

The irony that I'm giving a talk on Linked Data & am a Tech Director and am asking this is not lost on me. I'm also not too proud to admit gaps in my knowledge and to reach out for assistance when needed. Basically just interested in retrieving some RDF for the DOI, grab a screenshot of the response. Anyone interested in helping out and getting mad props at ALA, on Twitter, on Slideshare, and everywhere else I blab on about?

***UPDATE 20 Jan 2012***
I found it surprising that everyone (4ppl) err'd out, so I send a support question to CrossRef. Turns out their API had a bug! They weren't returning results for DOIs assigned to books that don't have an ISBN, and much of BHL is ISBN-less. Bowker is the ISBN registration agency in the US and we pose too much of a weird case for them, have never been able to move further on assigning ISBN's to legacy content, and probably won't. Anyway, they were glad we pointed out this anomaly and they'll have a fix out early next week. Glad I asked, rather than assuming it was me & giving up. Thanks to all who provided input & assistance: @asaletourneau, @cajunjoel, @rdmpage. And my talk is online at: http://www.slideshare.net/chrisfreeland/bhl-assigning-dois-other-identifiers-to-legacy-literature

4 comments:

Joel said...

So I tried this using the directions in the tech info document and their example works as expected, but using the BHL DOI, I get a server error.

My first guess will be to suggest that maybe there's something wrong or missing with the BHL DOI, but I don't know. I'd start with looking to see if content negotiation is enabled, whatever that means in terms of doi.org.

Sorry this didn't work out of the box, but it looks promising!

--Joel

Chris Freeland said...

Thanks for confirming I wasn't missing something obvious! I thought it...would just work. Will dig deeper with CrossRef.

Rod Page said...

I get the same thing, a 500 server error. It looks like the problem is with CrossRef as the doi.org sends the request to data.corssref.org, which then returns the 500

HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Location: http://data.crossref.org/10.5962%2Fbhl.title.10172
Expires: Fri, 20 Jan 2012 16:53:27 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 186
Date: Fri, 20 Jan 2012 07:06:54 GMT

Without wishing to sound like a broken record, the other issue is why bother with linked data? Obviously I know the stock answer, but the reality is that most "linked data" isn't linked. If you look at CrossRef RDF for an article the only external link (eventually) is to http://periodicals.dataincubator.org/ via the ISSN. Author names have arbitrary, non-resolvable URIs. So effectively the data is a silo. Linked data, yes, RDF, yes, but still a silo. BHL RDF from CrossRef will be even more siloed because it's not at article level so lacks the potential to be linked via ISSNs.

My point is that unless the RDF has external links to other linked data sources, we have a bad case of Emperor's cloths. If BHL is serious about RDF it will have to do it itself, with a focus on links. For example, author names should get external URIs where ever possible. Given that many BHL authors are "important" they will have Wikipedia entries, which means they will have a Dbpedia URI. Many journals will have ISSNs, which don't have an "official" URI but the http://periodicals.dataincubator.org/ URIs could be used (which would mean DOIs could be aggregated by ISSN). Items that exist in other major catalogues (e.g., Library of Congress, Open Library, etc.) should be linked to those records. Then there is article level data. A good number of articles within BHL items have external DOIs (e.g., Annals and Magazine of Natural History), which means part-whole links could be created using DOIs as URIs.

The notion that all we need do is pump out RDF and things will magically coalesce is frankly a myth. I fear we will have a few more years of people sprinkling "linked data" pixie dust before waking up to the reality that this stuff depends on links, and links require a lot of work to create. Without them we're just creating silos, using one of the ugliest data formats possible.

asaletourneau said...

A very valid point Rod. I think anyone (including myself!) who is just starting out exploring LOD needs this kind of a reminder that while Open Data is great, that's just the start of the journey. Will be going through our semantic wiki pages come Monday to ensure we are providing links through to other data where possible.