Thursday, August 10, 2017

First DOIs minted for ANS Digital Library items

Several weeks ago, we migrated an older, circa 2002 TEI ebook on the Taranto 1911 hoard, authored by John Kroll and Sebastian Heath, into our Digital Library. The original TEI file and subsequent updates have been loaded into our TEI Github repository. The updates follow transcription precedents that we have set in older ANS-published printed monographs as part of the Mellon-funded Open Humanities Book Program: relevant places, objects, people, etc. have been linked to entities in LOD systems, such as Nomisma.org. All of the objects within this hoard (itself linked to IGCH 1864) are in the British Museum and linked to their URIs. Upon publication into the ANS Digital Library, the document parts are now accessible from the IGCH 1864 record and in (eventually) in Pelagios, connected to relevant ancient places.

Since Sebastian is an active scholar, with an ORCID, this document served as a proof of concept for the next iteration of ANS digital publication: that our current and future monographs and journal articles, once issued openly online, should be connected to ORCIDs for their authors, and publication metadata should be submitted to Crossref to mint a DOI and enhance accessibility. Furthermore, since there's a direct connection between ORCID and Crossref submissions, this new digital publication workflow would automatically populate an author's scholarly profile with ANS publications. This is a vast improvement over the likes of Academia.edu, which requires manual submission. The broad vision is this:

Regardless of whether an author submits works through the American Numismatic Society Digital Library, Zenodo.org, Humanities Commons, their own institutional repository, or an Open Access journal system, their ORCID profile is the central, canonical aggregation of the entirety of their intellectual output (which includes datasets, software, etc.).

This aggregation system between DOIs and ORCIDs, following Linked Open Data principles, is the future of academic publication. Ideally, it should be expanded beyond citations to modern works with DOIs and ORCIDs to include more historic works defined by Worldcat and linked to historic scholars with ISNI identifiers. It would take a tremendous amount of work, but in theory, it would be possible to create a network graph of citations across all disciplines, going back in history to the advent of the printed book, charting the evolution of how knowledge is generated and disseminated. Therefore, Crossref, ISNI, and ORCID would perhaps play a greater role than providing simple (and superficial) citation metrics in enabling us to develop a broader historiography and analysis of scholarship itself. We plan to mint DOIs for our historical publications eventually, if Crossref extends its XML schema to support ISNI identifiers.

Under the Hood

Some extensions were implemented in ETDPub, the TEI/MODS publication framework that underlies the ANS Digital library. First, I authored XSLT stylesheets that would crosswalk TEI or MODS into the appropriate Crossref XML model according to their schema version 4.4.0. You can see an example of my MA thesis here: http://numismatics.org/digitallibrary/ark:/53695/gruber_roman_numismatics.xref.

XSLT:
If the author/editor URI matches an ORCID URI in the TEI, then the Admin panel in ETDPub will enable the publication of the metadata to Crossref. Similarly, within the MODS ETD editing interface (in XForms), a user can insert a mods:nameIdentifier[@type='orcid'] under the mods:name for an author/editor in order to capture the ORCID. So far, only TEI or MODS records with ORCIDs attached to people are available for submission into Crossref to mint a DOI.

Submission Workflow

In the admin panel, if a document is eligible for submission to Crossref, a checkbox is available. Clicking on this will fire off a series of actions in the XForms engine:
  1. The TEI/MODS-to-Crossref XML transformation is executed and loaded into an XForms instance
  2. The Crossref XML is serialized to /tmp because it must be attached via multipart/form-data
  3. Still having difficulty getting multipart/form-data to execute correctly in the XForms engine, the XForms engine instead interacts with a PHP script in CGI
  4. After the PHP script responds with a successful HTTP code, the MODS/TEI document is loaded in the XForms engine in order to insert the DOI in the proper location within the document
  5. The TEI/MODS file is saved back to eXist, and the standard publication workflow is executed (a chain of XForms submissions), updating the Solr search index and the triplestore/SPARQL endpoint
So far two documents in the Digital Library have DOIs connected to ORCIDs:

Taranto 1911: http://dx.doi.org/10.26608/taranto1911
My thesis (Recent Advancements in Roman Numismatics): http://dx.doi.org/10.26608/gruber_roman_numismatics

Friday, July 14, 2017

Improved mapping in EADitor - Brett archaeology photos as a test

At long last, I have migrated from OpenLayers to Leaflet in EADitor. This required modifications in two areas: the HTML pages for rendering EAD finding aids and the map interface. As a result, I introduced two new serializations:

  • The map interface renders Solr search results rendering into GeoJSON (instead of OpenLayers displaying Solr->KML as before)
  • A transformation of an EAD finding aid into GeoJSON. A GeoJSON point is created for all unique mappable places from Geonames or Pleiades, and coordinates are extracted in real time by reading Geonames APIs or Pleiades RDF. The GeoJSON features include references to all uniquely addressable components that include that place in the controlaccess element. You can append the extension '.geojson' to get JSON response. Content negotiation will be implemented eventually. See http://numismatics.org/archives/ark:/53695/nnan0037.geojson for example.

 

 Restructuring the Agnes Baldwin Brett finding aid

Agnes Baldwin Brett was a curator at the ANS from 1909-1912 and a prominent scholar of Greek numismatics. Our archives hold a variety of interesting materials, including photographs from her travels around Greece, Italy, and Turkey in the early 1900s. Numerous photos have been digitized, were uploaded to flickr Commons, and linked to the Brett EAD finding aid. Some photographs were identified and described (with brief text snippets) by ANS archivist, David Hill, but all photographs were placed in a single series-level component. All identifiable places were linked in EADitor's Geonames lookup mechanism in a top-level controlaccess element. There was no direct correlation between individual photographs and the people, places, and things depicted.

In order to demonstrate the full functionality of the new mapping interface, I finally took the time to restructure the finding aid so that each photograph would appear in its own item-level component with a controlaccess element enabling individual identification of the place depicted in the photo. Furthermore, while many finding aids have been linked to modern places defined in Geonames, the Brett collection of archaeological photographs provided an opportunity to link photos to ancient places in Pleiades, which would, in turn, open the door to the integration of these valuable materials into the wider Linked Ancient World Data cloud via Pelagios. The photos feature Mycenaean tombs, Greek temples, and even the Grave Stele of Hegeso.

Identifying individual monuments within Athens


Not only that, some photographs feature other students from the American School of Classical Studies at Athens that went on to be prominent scholars later in life. Since many of these scholars have produced published works and archival materials held at other institutions, they have URIs in the Social Network and Archival Context project. EADitor has had SNAC lookups for quite some time, and so I was able to link photos to these URIs when applicable. I hope that we can make these photos available to researchers even beyond the ancient world.

Linking people to SNAC
In addition to the tagging of places and people, many photographs feature known archaeological monuments that are notable enough to warrant their own Wikipedia articles, and therefore Wikidata entity URIs. I extended the subject lookup mechanism in EADitor beyond the standard Library of Congress Subject Headings to query the Wikidata API, embedding entity IDs directly into the EAD finding aid, which are then transformed into dcterms:subject URIs upon RDF serialization.

 

EAD to RDF

Since each individual component has an ID in EADitor, each component is uniquely addressable by fragment identifiers, e.g., http://numismatics.org/archives/ark:/53695/nnan0037#d1e131. After making some minor modifications to the RDF output to conform with the emerging schema.org archival extension, These Wikidata, SNAC, Pleiades, and Geonames URIs are exposed in the RDF for each component, which are hierarchically linked together.

@prefix arch: <http://purl.org/archival/vocab/arch#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://numismatics.org/archives/ark:/53695/nnan0037#d1e131> a schema:ArchiveItem ;
    dcterms:coverage <http://www.geonames.org/264371> ;
    dcterms:date "1900-12-07"^^xsd:date ;
    dcterms:identifier "06-00242" ;
    dcterms:isPartOf <http://numismatics.org/archives/ark:/53695/nnan0037#c_92f631e3f903281a8cdedbfebfca0654> ;
    dcterms:subject <http://socialarchive.iath.virginia.edu/ark:/99166/w61c5qjp> ;
    dcterms:title "American School students wearing bug bags" ;
    dcterms:type <http://vocab.getty.edu/aat/300046300> ;
    foaf:depiction <http://farm9.staticflickr.com/8320/8003385533_c83827b679_o.jpg> ;
    foaf:thumbnail <http://farm9.staticflickr.com/8320/8003385533_55f1f093b1_t.jpg> .

This RDF is posted into Archer's SPARQL endpoint.

Archer RDF → SPARQL → Pelagios RDF

Now that we have numerous uniquely addressable photographs linked to Pleiades URIs published in our SPARQL endpoint, it was a breeze to create an RDF export for Pelagios. It is essentially a DESCRIBE query, and our model of RDF is run through XSLT into the Pelagios data model.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms: <http://purl.org/dc/terms/>
DESCRIBE ?s WHERE {
 ?s dcterms:coverage ?place FILTER (strStarts(str(?place), 'https://pleiades.stoa.org'))  
}

The link to the Pelagios VoID is available on the front page of Archer. It is generated by an ASK query similar to above to see whether there are any objects in the SPARQL endpoint with Pleiades places expressed by the dcterms:coverage property.

Summary

The Brett collection is incredibly interesting, and I hope that we will be able to digitize more photographs and the corresponding travel diary at some point in the future. There are still many photographs that haven't been identified, and so perhaps we might be able to accomplish this through crowdsourcing. We will implement a IIIF server by the end of summer and begin the transition of our archival materials into IIIF--not only photographs, but also the Newell diaries. Perhaps one day we will be able to annotate the people, places, and things from the Brett diary and photographs with Mirador or a similar IIIF viewer. While Pelagios integration is somewhat imminent, the aggregation of disparate archival holdings through shared SNAC identifiers is still further along the horizon.

Tuesday, February 28, 2017

Final four Mellon-funded TEI ebooks published

The final four of a group of 86 American Numismatic Society-published books have been checked and uploaded to our Digital Library. Here are some stats I was able to produce from various SPARQL queries of the TEI->Open Annotation RDF:

  • 349 mentions of 164 different Greek coin hoards published in IGCH in 193 sections in 14 books.
  • 266 unique references to nomisma URIs. 146 are mints or regions, and 87 of these identifiers are matches with Pleiades places. These mint references appear in 600 sections 51 books. Including direct Pleiades references (and not only those which are implicit by means of Nomisma concordances), there are 621 sections in these 51 books which will be accessible through the Pelagios Project.
  • 97 of the 266 references are to people, most of whom are linked to Wikidata and VIAF entities that are, in turn, linked to other systems, such as Social Networks and Archival Context
  • More than 1,400 coins in the ANS collection are referenced
  • 139 Roman Imperial coin types in OCRE
  • 4 Roman Republican coin types in CRRO 
These four are the final of 86 total books digitized as part of the NEH-Mellon Open Humanities Book program.  Many thanks to both the National Endowment for the Humanities and the Mellon Foundation for making this possible. The framework and methodologies implemented in this project will be applied to further digitization here at the ANS as we move toward making our entire collection of monographs freely and openly accessible, and I hope that other academic publishers and learned societies will follow in our footsteps in this endeavor.

These books go beyond simple transcription and publication as EPUB files. With links to our own research databases internally and externally to Linked Open Data information systems, we hope that these works will be transformed into research portals to further context about the people, places, events, etc. mentioned in the text. On the other side of the coin, so to speak, researchers interested about the entities, objects, coin hoards, etc. will have access to a wealth of historical information about these things and will gain access to our monographs not only from our own Library, Archive, and Museum systems, but through projects like Pelagios, Digital Public Library of America, and other large scale aggregators of cultural heritage materials.

Friday, January 13, 2017

More than 80 LOD-enhanced ebooks published to the ANS Digital Library

The American Numismatic Society has nearly completed its Mellon Foundation-funded Humanities Open Book program. Eighty-two of 86 books have been enhanced by a Whitney Christopher, a TEI specialist from the King's College London DH program to link to people and places defined on Nomisma.org, Pleiades (either directly linked or by means of Nomisma's internal concordance system), VIAF, Wikidata, and the ANS's own archival authority control system. The final four books will go online soon. They are all available in the ANS Digital Library.

The number of people and places mentioned in these texts is a staggering figure, and it should be noted that we have focused on linking those entities that are most relevant to the texts, but we will continue to refine the linking over time, especially when it comes to Nomisma concepts and bibliographic references to Worldcat Works (links to which have not yet been incorporated). As Nomisma expands further into the Greek world and other domains of numismatics (after the ancient period), we will return to these ebooks to insert or replace links to Nomisma mints, people, and political entities.

Beyond relevant people and places, we have inserted hundreds of links to IGCH records (about 170 different coin hoards are cited in 400 locations in a handful of books), to the ANS collection, and to coin types defined in OCRE or CRRO. So far, more than 100 coins in the ANS and 6 in the Smithsonian American Art Museum have been identified by their accession numbers, although one of the four remaining books to be published will soon include nearly 70 more links to ANS coins. There are many more coins referenced in these books that may now belong to the ANS, but were not accessioned at the date of publication. A curator with more specific knowledge will need to identify these in the future.

One of the most often cited hoard is the Demanhur Hoard (IGCH 1664), which is mentioned in four books and on various pages of two of Edward Newell's notebooks. By linking archival authorities mentioned in these texts, we have greatly enhanced access to the works by and about Edward Newell and other prominent numismatic figures associated with the Society. A user of the ANS's authority portal (built on EAC-CPF) will have access to books written by Newell in our digital library, as well as his archival materials. Furthermore, mentions of Newell from the books written by other scholars will appear under annotations. In his case, he is mentioned in 18 other books, sometimes in multiple sections.

Like Mantis, the OCRE and CRRO config files have been updated to link to our archival SPARQL endpoint, and therefore annotations about specific types are accessible directly through types defined in these system. Nearly 50 types in OCRE are linked from Roman Medallions, and a researcher can drill down into a specific section of the book from RIC 5 Gallienus and Salonina 1.

Finally, through the links to Pleiades, each section in each book that mentions an ancient place will be accessible in Pelagios.

Monday, September 26, 2016

Publication of the NEH/Mellon Open Humanities ebooks

About a month ago, we pushed about 85 TEI files into production in the ANS Digital Library. These ebooks were transcribed from HathiTrust scans as part of the NEH/Mellon Open Humanities Book Program. Not all of the books have value-added tagging yet. We hired a TEI specialist several weeks ago to begin the process of linking coins, coin types, hoards, people, places, and other subject matter in the body of these books to URIs in our or other information systems.

So far three of these books are complete:
  1. The Fifth Dura Hoard
  2. The Earliest Coins of Norway
  3. The Medallic Work of A.A. Weinman
Like the first book published into our Digital Library (Noe's Coin Hoards), the TEI links have been transformed into RDF conforming to Open Annotation, and these annotations are available in our other systems. For example, J. Sanford Saltus is referenced in The Medallic Work of A.A. Weinman, and so this annotation is available in the biography of Saltus in our EAC-CPF-driven authority system.

Most of the remaining books should have completed value-added TEI markup by the end of the year.

Thursday, March 17, 2016

First EBook published as part of Mellon/NEH Humanities Open Book Project

This is a follow-up to some major feature additions in MANTIS and IGCH detailed on the Numishare blog.

Today, we have published our first out of print, open access EBook for the NEH/Mellon Foundation Humanities Open Book Program. It is Sydney Noe's 1920 Coin Hoards, the first issue of Numismatic Notes and Monographs. As we discussed in our grant application, we had a vendor transcribe these PDFs of images we received from HathiTrust into TEI. The TEI is run through a normalization XSLT stylesheet to correct some issues and pull bibliographic metadata from various sources, and then value-added tagging is applied to link to coins in our collection, hoards on coinhoards.org, and entities in various geographic gazetteers or linked open data vocabulary systems.

As a result, we not only have a digital text that you can view in your browser as HTML5 or download as an EPUB 3.0.1, but a richly-tagged document that is exposed as RDF conforming to Open Annotation, which is then published into our archival SPARQL endpoint (and soon published into Pelagios). Many of the technical features of this publication process have already been discussed in this blog or in the post linked above.

This framework is part of a broader effort to integrate all of our Library, Archive, and Museum holdings into a central hub for numismatic research. It is therefore possible to gain further insight about the people, places, and things mentioned in these digital publications through Linked Open Data methodologies, but also to provide greater context to our data-driven numismatic research projects like IGCH, OCRE, etc.

Not only do we have a rich set of interlinked numismatic projects focusing on hoards, coins, and coin types, but now between these things and numismatic monographs and journals, archival research notebooks, finding aids, and authority records. Not only is it possible to read biographical information about Sydney Noe in Archer, you can view a map and timeline of his life, his social network graph, and gain access to a list of materials written by or about him.

This is the topic of my CAA presentation in Oslo in a few weeks.

Friday, March 11, 2016

Toward a more thoroughly integrated numismatic research system

I am making updates to our systems in preparation for the initial publication of NEH/Mellon EBooks. Part of the project is to thoroughly integrate these EBooks with our collection, archives, IGCH, and related project databases. I still have some work to do, but should have the first EBooks ready next week.
I updated the RDF model for our digitized Newell notebooks to conform to the model for our EBooks (Open Annotation) (there is one book published so far, the ANS Medals book by Miller). What this means is that mentions of IGCH, other scholars represented in our biographies site, and [soon] individual coins in Newell's notebooks will be made available through those other interfaces.

See http://coinhoards.org/id/igch1399

  • You can click on individual pages where Newell notes IGCH1399, and the page will load in Archer.
  • You can see a list of coin types from this hoard, and you can download the list of coin types or a full list of coins from the hoard (note that we aren't publishing our Greek coins that aren't connected to coin type URIs in nomisma.org's SPARQL endpoint).

On http://numismatics.org/authority/id/newell (an EAC-CPF authority record)

These already functioned --
  • See a list of archival materials about Edward Newell
  • (Fairly new) Several annotations in Miller's Medallic Arts of the ANS where he mentions Newell. You can click a link to go directly to a section.
  • A social network graph showing Newell and his relations (also driven by SPARQL, detailed here).

On http://numismatics.org/authority/id/noe
  • As before, you can get a list of archival materials about Noe
  • Newell mentions Noe on two pages of a notebook

Next steps:
  1. Update the code for Mantis to display annotations about specific coins referenced in Newell's notebooks or our EBooks.
  2. Update the Pelagios exports for the Digital Library and Archer to make our EBooks and archival materials more broadly accessible to the ancient world community
  3. Build widgets into our Digital Library to pull data from our other systems

This interlinking will be inherent to the publication mechanism for our EBooks. When we publish the first several next week, the annotations will be available in Mantis, the Archer Biographies, IGCH, etc.

I will be discussing these things and more in my presentation at CAA in Oslo at the end of the month.