Thursday, August 10, 2017

First DOIs minted for ANS Digital Library items

Several weeks ago, we migrated an older, circa 2002 TEI ebook on the Taranto 1911 hoard, authored by John Kroll and Sebastian Heath, into our Digital Library. The original TEI file and subsequent updates have been loaded into our TEI Github repository. The updates follow transcription precedents that we have set in older ANS-published printed monographs as part of the Mellon-funded Open Humanities Book Program: relevant places, objects, people, etc. have been linked to entities in LOD systems, such as Nomisma.org. All of the objects within this hoard (itself linked to IGCH 1864) are in the British Museum and linked to their URIs. Upon publication into the ANS Digital Library, the document parts are now accessible from the IGCH 1864 record and in (eventually) in Pelagios, connected to relevant ancient places.

Since Sebastian is an active scholar, with an ORCID, this document served as a proof of concept for the next iteration of ANS digital publication: that our current and future monographs and journal articles, once issued openly online, should be connected to ORCIDs for their authors, and publication metadata should be submitted to Crossref to mint a DOI and enhance accessibility. Furthermore, since there's a direct connection between ORCID and Crossref submissions, this new digital publication workflow would automatically populate an author's scholarly profile with ANS publications. This is a vast improvement over the likes of Academia.edu, which requires manual submission. The broad vision is this:

Regardless of whether an author submits works through the American Numismatic Society Digital Library, Zenodo.org, Humanities Commons, their own institutional repository, or an Open Access journal system, their ORCID profile is the central, canonical aggregation of the entirety of their intellectual output (which includes datasets, software, etc.).

This aggregation system between DOIs and ORCIDs, following Linked Open Data principles, is the future of academic publication. Ideally, it should be expanded beyond citations to modern works with DOIs and ORCIDs to include more historic works defined by Worldcat and linked to historic scholars with ISNI identifiers. It would take a tremendous amount of work, but in theory, it would be possible to create a network graph of citations across all disciplines, going back in history to the advent of the printed book, charting the evolution of how knowledge is generated and disseminated. Therefore, Crossref, ISNI, and ORCID would perhaps play a greater role than providing simple (and superficial) citation metrics in enabling us to develop a broader historiography and analysis of scholarship itself. We plan to mint DOIs for our historical publications eventually, if Crossref extends its XML schema to support ISNI identifiers.

Under the Hood

Some extensions were implemented in ETDPub, the TEI/MODS publication framework that underlies the ANS Digital library. First, I authored XSLT stylesheets that would crosswalk TEI or MODS into the appropriate Crossref XML model according to their schema version 4.4.0. You can see an example of my MA thesis here: http://numismatics.org/digitallibrary/ark:/53695/gruber_roman_numismatics.xref.

XSLT:
If the author/editor URI matches an ORCID URI in the TEI, then the Admin panel in ETDPub will enable the publication of the metadata to Crossref. Similarly, within the MODS ETD editing interface (in XForms), a user can insert a mods:nameIdentifier[@type='orcid'] under the mods:name for an author/editor in order to capture the ORCID. So far, only TEI or MODS records with ORCIDs attached to people are available for submission into Crossref to mint a DOI.

Submission Workflow

In the admin panel, if a document is eligible for submission to Crossref, a checkbox is available. Clicking on this will fire off a series of actions in the XForms engine:
  1. The TEI/MODS-to-Crossref XML transformation is executed and loaded into an XForms instance
  2. The Crossref XML is serialized to /tmp because it must be attached via multipart/form-data
  3. Still having difficulty getting multipart/form-data to execute correctly in the XForms engine, the XForms engine instead interacts with a PHP script in CGI
  4. After the PHP script responds with a successful HTTP code, the MODS/TEI document is loaded in the XForms engine in order to insert the DOI in the proper location within the document
  5. The TEI/MODS file is saved back to eXist, and the standard publication workflow is executed (a chain of XForms submissions), updating the Solr search index and the triplestore/SPARQL endpoint
So far two documents in the Digital Library have DOIs connected to ORCIDs:

Taranto 1911: http://dx.doi.org/10.26608/taranto1911
My thesis (Recent Advancements in Roman Numismatics): http://dx.doi.org/10.26608/gruber_roman_numismatics