Friday, April 6, 2018

117 ANS ebooks published to Digital Library

I have finally put the finishing touches on 117 ANS out-of-print publications that have been digitized into TEI (and made available as EPUB and PDF) as part of the NEH and Mellon-funded Open Humanities Book project. This is the "end" (more details on what an end entails later) of the project, in which about 200 American Numismatic Society monographs were digitized and made freely and openly available to the public.

All of these, plus a selection of numismatic electronic theses and dissertations as well as two other ebooks not funded by the NEH-Mellon project, are available in the ANS Digital Library. The details of this project have been outlined in previous blog posts, but to summarize, the TEI files have been annotated with thousands of links to people, places, and other types of entities defined in a variety of information systems--particularly Nomisma.org (for ancient entities), Wikidata, and Geonames (for modern ones).

Additionally:
  • Books have been linked to 153 coins (so far) in the ANS collection identified by accession number. Earlier books cite Newell's personal collection, bequeathed to the ANS and accessioned in 1944. A specialist will have to identify these.
  • 173 total references to coin hoards defined in the Inventory of Greek Coin Hoards, plus several from Kris Lockyear's Coin Hoards of the Roman Republic.
  • 166 references to Roman imperial coin types defined in the NEH-funded Online Coins of the Roman Empire.
  • A small handful of Islamic glass weights in The Metropolitan Museum of Art 
  • One book by Wolfgang Fischer-Bossert, Athenian Decadrachm, has a DOI, connected to his ORCID.
Since each of these annotations is serialized into RDF and published in the ANS archival SPARQL endpoint, the other various information systems (MANTIS, IGCH, OCRE, etc.) query the endpoint for related archival or library materials.

For example, the clipped shilling, 1942.50.1, was minted in Boston, but the note says it was found among a mass of other clippings in London. The findspot is not geographically encoded in our database (and therefore doesn't appear on the map), but this coin is cited in "Part III Finds of American Coins Outside the Americas" in Numismatic finds of the Americas.


Using OpenRefine for Entity Reconciliation

Unlike the first phase of the project, the people and places tagged in these books were extracted into two enormous lists (20,000 total lines) that were reconciled against Wikidata, VIAF, or Nomisma OpenRefine reconciliation APIs. Nomisma was particularly useful because of the high degree of accuracy in matching people and places. Wikidata and VIAF were useful for modern people and places, but these were more challenging in that there might be dozens of American towns with the same name or numerous examples of Charles IV or other regents. I had to evaluate the name within the context of the passage in which it occurred, a tedious process that took nearly two months to complete. The end result, however, has a significantly broader and more accurate coverage than the 85 books in the first iteration of the grant. After painstakingly matching entities to their appropriate identifiers, it only took about a day to write the scripts to incorporate the URIs back into the TEI files, and a few more days of manual, or regex linking for IGCH, ANS coins, etc.

As a result of this effort, and through the concordance between Nomisma identifiers and Pleiades places, there are a total of 3,602 distinct book sections containing 4,304 Pleiades URIs, which can now be made available to scholars through the Pelagios project.


What's Next for ANS Publications?

So while the project concludes in its official capacity, there is room for improvement and further integration. Now that the corpus has been digitized, it will be possible to export all of the references into OpenRefine in an attempt to restructure the TEI and link to URIs defined by Worldcat. We will want to link to other DOIs if possible, and make the references for each book available in Crossref. Some of this relies on the expansion of Crossref itself to support entities identifiers beyond ORCID (e.g., ISNI) and citations for Worldcat. Presently, DOI citation mechanisms allow us to build a network graph of citations for works produced in the last few years, but the extension of this graph to include older journals and monographs will allow us to chart the evolution of scientific and humanistic thought over the course of centuries.

As we know, there is never an "end" to Digital Humanities projects. Only constant improvement. And I believe that the work we have done will open the door to a sort of born-digital approach to future ANS publications.