Friday, August 29, 2014

xEAC pre-production release ready for wider testing

xEAC (https://github.com/ewg118/xEAC), an open source, XForms-based framework for the creation and publication of EAC-CPF records (for archival authorities or scholarly prosopographies) is now ready for another round of testing. While xEAC is still under development, it is essentially production-ready for small-to-medium collections of authority records (less than 100,000).

xEAC handles the majority of the elements in the EAC-CPF schema, with particular focus on enhancing controlled vocabulary with external linked open data systems and the semantic linking of relations between entities. The following LOD lookup mechanisms are supported:

  • Geography: Geonames, LCNAF, Getty TGN, Pleiades Gazetteer of Ancient Places
  • Occupations/Functions: Getty AAT
  • Misc. linking and data import: VIAF, DBpedia, nomisma.org, and SNAC

xEAC supports transformation of EAC-CPF into a rudimentary form of three different RDF models and posting data into an RDF triplestore by optionally connecting the system to a SPARQL endpoint. Additionally, EADitor (https://github.com/ewg118/eaditor), an open source framework for EAD finding aid creation and publication can hook into a xEAC installation for controlled vocabulary as well as posting to a triplestore, making it possible to link archival authorities and content through LOD methodologies.

The recently released American Numismatic Society biographies (http://numismatics.org/authorities/) and the new version of the archives (http://numismatics.org/archives/) illustrate this architecture. For example, the authority record for Edward T. Newell (http://numismatics.org/authority/newell), contains a dynamically generated list of archival resources (from a SPARQL query). This method is more scalable and sustainable in the long run than using the EAC resourceRelation element. Now that SPARQL has successfully been implemented in xEAC, I will begin to integrate social network analysis interfaces into the application.
More information:

Extended Linked Data Controlled Vocabulary in xEAC and EADitor

Getty TGN

Last week, the Getty announced the latest installation of their linked open data vocabularies: the Thesaurus of Geographic Names. Like the previously released AAT, the TGN is available through a SPARQL endpoint. After returning from the Semantic Technology and Business conference in San Jose (which I have discussed in another blog post), I set out to integrate TGN lookups into the various cultural heritage data frameworks that I'm developing.

Both xEAC and EADitor have been extended to enable lookups of the Getty TGN through their editing interfaces. The functionality is identical to the occupation and function lookups in both systems. 1. The user performs a text search for a term, 2. the XForms engine submits a SPARQL query to the Getty endpoint, and 3. the user then selects the appropriate item from a list generated from the SPARQL response. See the example from xEAC, below:


The geographic lookup mechanism in xEAC also includes an option for geographic names in the Library of Congress Name Authority File.

 

SNAC Integration

In addition to extending the geographic lookup functionality in both EADitor and xEAC, I have also implemented a SNAC lookup in both applications. With the addition of two URL parameters, the search results page in SNAC can provide the raw cross query XML response instead of the default HTML. I hope that SNAC will eventually provide a documented search API that returns results in a more formal standard, like Atom.

In xEAC, the lookup will embed the SNAC URI into the otherRecordId and source in the EAC-CPF control. Nothing else is pulled from SNAC at the moment, either into the EAC record or into the public user interface, although this could change eventually.

In EADitor, the persname, corpname, and famname element components have been extended to include the SNAC lookup in addition to VIAF and xEAC (if a xEAC instance has been added into the EADitor settings). The SNAC URI is stored in the @authfilenumber of the associated EAD element.

SNAC URIs that are embedded into EAD finding aids (like the URIs from other linked open data vocabulary systems) will be included in the RDF serialization of the archival collection data. This may pave the way for users of EADitor to make their content accessible through SNAC, or whatever international archival entity system evolves from SNAC, by means of linked open data technologies.