Friday, December 4, 2015

The ANS Digital Library, a Look Under the Hood

The ANS announced the launch of its Digital Library few months ago. There are only a few items in the repository at the moment, but we will be expanding in the very near future to include journal articles and open access EBooks. This blog post will introduce some of the technical concepts behind the open source DL framework, ETDPub.

The idea that initially drove our framework was the desire to make numismatic theses and dissertations more widely and freely accessible. Andrew Reinhard, ANS Director of Publications, came to me in the late summer to put together something very basic that we could launch at the INC in Taormina in late September. At first, I looked into an off the shelf tool called Vireo, developed by the Texas Digital Library. However, this platform was designed for the phases of dissertation review and publication into an institutional repository at a university. It is a backend-only with no front-end to speak of. The only solution was to build something effective quickly. The basic specifications for ETD publication were: an interface for basic metadata entry, and upload mechanism for PDFs or other documents, and a front end to provide the public with access to the documents.

Since I've done a lot of XForms development upon library metadata standards in the past, and since nearly all of our applications are already built in XRX/SPARQL design concepts in Orbeon, we opted to use Orbeon for this framework as well. We put together a basic MODS template for electronic theses and dissertations and an XForms editor to handle data entry, document upload, and web service interaction. Like EADitor, xEAC, Numishare, etc. there are lookup mechanisms for the Getty LOD thesauri, Geonames, VIAF, Nomisma.org, Pleiades for ancient geography, and LSCH from the Library of Congress. In even includes lookups for authority records from a xEAC installation (like EADitor). We went from development to production in the first version of the framework in about a week.



Saving the MODS file writes it to an eXist XML database, publishes the metadata to Solr, and indexes the document file into Solr for full-text searching using the ExtractingRequestHandler. Yesterday, I extended the publication functionality to serialize MODS into RDF to post triples in a SPARQL endpoint. This draws content from our Digital Library into our archival platforms built on EADitor and xEAC. We are digitizing auction catalogs, books, and journals edited or authored by prominent numismatic scholars that also played a role in the Society, and therefore have EAC-CPF records in the ANS Biographies service. For example, our Digital Library contains one auction catalog edited by Edgar H. Adams. The metadata from this catalog are published to the SPARQL endpoint, and two items from our archive (an EAD finding aid and a photograph described in MODS) are also available from the biographical page in the Adams authority record. This is the ideal model for larger-scale aggregation of cultural heritage content associated with archival authorities. It is nearly impossible to maintain these connections by hard-coding resourceRelation elements in the EAC-CPF record.

So now we have three standalone software frameworks that comprise our digital library and archive, all connected together via linked open data methodologies. The next step is to begin integrating coins from our collection into this broader network of numismatic information.

We will begin this work soon with the digitization of ANS monographs. These books contain references to coins in our collection, to hoards published on coinhoards.org, to materials in our archive, and to numismatic concepts defined on nomisma.org.

ETDPub already supports the publication of TEI and dynamic serialization of TEI into EPUB 3.0.1.

More details soon.

No comments:

Post a Comment