Thursday, June 26, 2014

First Newell notebook published in Archer

The first of several dozen digitized Greek coin hoard research notebooks written by Edward Newell has been published into the newly-relaunched ANS archival resource, Archer. This is the first publicly-accessible product from a grant of $7,500 received from The Gladys Krieble Delmas Foundation to digitize these valuable resources. This notebook, written in 1939, includes notes on a handful of Greek coin hoards which were eventually published in An Inventory of Greek Coin Hoards (IGCH).

Specifications

We wanted to go beyond simply scanning page images and offering each notebook as an open access PDF. We wanted to include zoomable images, basic pagination functionality, and annotation of particular items on each page. We decided not to go as far as to offer full transcriptions, but the annotations could include free text or links to other resources on the web: such as linking to coins in the ANS collection, contemporary scholars mentioned in the text that might have entries in VIAF or dbpedia, linking to nomisma.org defined IGCH entries and other identifiers for mints, regions, or numismatic authorities, books in the ANS library, and place names in Geonames. Furthermore, these terms should be indexed into Solr and populate the facet terms in Archer's browse interface, and RDF should be made available for linking resources together.

Technical Underpinnings

We chose TEI XML as the vessel for encoding data about the notebooks. The TEI files include bibliographic metadata in the header and a list of facsimile elements that link to page images and contain xy coordinates for the annotations. The elements also include multiple surface elements for each annotation, with mixed content of free text and links. We have hooked into Rainer Simon's Annotorious library in both the public user interface and the XForms-based backend. Both interfaces are now part of EADitor's core functionality.

It should be stressed that the TEI editing functionality in EADitor is tailored toward annotation of facsimile images. The header is not yet editable, nor is the body, but this functionality may be enhanced in the future. Nevertheless, when you open the TEI form, there is a Facsimiles tab. In this tab are two columns. On the left is a list of thumbnails. Clicking on a thumbnail will load the large image into OpenLayers and make it annotatable through Annotorious.


The JSON generated by Annotorious gets parsed by Orbeon (the XForms processor) and transformed into TEI and inserted into the XML document (not just the text annotations, but annotation coordinate values crosswalked into TEI attributes).
<facsimile xml:id="nnan0-187715_X007">
  <graphic url="0-187715_X007" n="Loose 2"/>
  <surface xml:id="aj6xve43lj5" ulx="-0.86506513142592" uly="1.3798999767388" lrx="-0.45771458478717" lry="1.2864639140252">
     <desc>
      <ref target="http://nomisma.org/id/sardes">Sardis</ref>
    </desc>
  </surface>
</facsimile>
URIs are parsed and made into TEI "ref" elements. The label of the element may be extracted from an API, depending on the content of the URI. For example, a URI matching "http://numismatics.org/collection" will append ".xml," and the title will be pulled from the NUDS/XML serialization. Similarly, an AACR2-conformant place name label will be generated from querying Geonames' APIs and RDF is pulled from nomisma.org, dbpedia, or VIAF to extract the skos:prefLabel or rdfs:label (see more with the XPL and XSLT).


Once a new annotation is created and URL is parsed, the link will be clickable. When you click on a different thumbnail in the left-hand column to load another image for annotation, the TEI file will be saved to eXist, and the document will be re-published to Solr, if it had previously been designated as public, and re-published to the RDF triplestore, if the TEI document is both public and EADitor's config contains endpoint URLs for updating and posting data.

Public User Interface

Since the EAC-CPF based authority records delivered through xEAC and Archer are now both hooked into an RDF triplestore and SPARQL endpoint, the notebooks will appear under the list of related resources in the Newell authority record, together with finding aids (encoded in EAD) in which Newell is a creator or correspondent or photographs (encoded in MODS) in which he appears.

The interface for the notebook itself  is not markedly different than for other record types in EADitor. In the right column is OpenLayers+Annotorious (in read-only mode) for showing images and annotations. The annotations are rendered by querying a REST protocol inherent to EADitor which transforms the TEI facsimile element into Annotorious' JSON model by passing in a few request parameters. Images can be paged through by clicking on thumbnails, next/previous links, or a drop down menu.


In the left column is bibliographic metadata and a list of unique terms that appear in the TEI document. The terms are clickable links to direct the user to the results page for that term's query. A user may click on the external link image to load the target URL (for example, to view the VIAF or Geonames page). Lastly, under each term are page numbers on which the term appears in the document. Clicking on one of these page number links will load the image and annotations in OpenLayers.

There are still some more improvements to make with the system, especially in making the TEI Header editable, but I think that this framework has really great potential for lowering the barriers to creating, editing, and publishing TEI, especially in a large scale linked open data system.

No comments:

Post a Comment