Wednesday, January 13, 2016

First EBook published to ANS Digital Library

This afternoon, we have published our first EBook to the ANS Digital Library in the ETDPub framework. This EBook, Medallic Art of the American Numismatic Society, 1865–2014 by Scott Miller, is encoded in TEI and has been issued with a Creative Commons BY-NC license. While the TEI file has not been fully linked into name and place authority files, I was able to use regex to link to medals in the American Numismatic Society collection and link one prominent scholar, Edward T. Newell, to our archival authority record. The TEI file will be fully linked up later, but the publication of this EBook can be seen as a completely functional demonstration of the technical application of linked open data principles to publishing these types of books for the larger NEH-Mellon Humanities Open Book project.

The TEI is indexed into Solr by ETDPub for full-text search, but this is only the beginning of this system's features. Using Orbeon's XPL pipelines, we are able to cobble together a series of XSLT transformations of the TEI file into XHTML and other XML files (NCX, OPF) required by the EPUB 3.0.1 specification. There is a link to the EPUB download on the page for the EBook, and the EPUB file is generated dynamically. I have tested on an ereader application on my desktop (Ubuntu) and a few on my Android phone. They mostly seem to work well, but the table of contents isn't consistently functional, but this seems to be more of an issue of the individual app not supporting EPUB 3.0.1 correctly rather than the EPUB file itself. I plan to put together a survey to assist in usability testing.

It is also important to note that the focus with EPUB serialization so far has been almost solely on functionality. The XSLT stylesheets are very basic and I have applied almost no CSS styling, but there is potential in enhancing the overall aesthetic of the document. That will come later as functional issues are ironed out. I am aware the tables do not seem to render properly.

The other major feature of ETDPub's TEI publishing is serialization into RDF (so far, XML, but JSON-LD and Turtle outputs are coming). This RDF is fairly rudimentary so far. The RDF contains a data object for the book as a whole (and associated metadata in dcterms, like creator and publisher) and for each child div, using dcterms:isPartOf to link the hierarchical structure of the book together. Furthermore, any link (ref element) within the lowest level relevant div and any name that has been linked via the @corresp attribute to an authoritative URI in the teiHeader will be rendered as an annotation following the Open Annotation model. ETPub is capable of executing CRUD operations with an SPARQL 1.1-compliant endpoint, and so the Digital Library application is posting into the triplestore that links our archival objects and authority records together. I have previously discussed linking xEAC and EADitor together via SPARQL, and now ETPub is capable of doing the same. The authority record for Newell now includes links to the sections in the ANS medals book in which he was mentioned, in addition to the research notebooks and photographs associated with Newell that are contained in the ANS archives. We are moving forward with linking our library, archives, and collection more closely together internally, as well as paving the way for scholars to gain further context with data sources outside the ANS.