Wednesday, February 5, 2014

Integrating EADitor with the Getty linked data AAT

I've been following linked open data developments at the Getty pretty closely over the last few months, especially related to incorporating Getty AAT URIs (and eventually ids from other vocabulary systems) into and my side-project, a LOD thesaurus geared specifically toward Greek pottery.

For some reason, it occurred to me only yesterday that I should adapt EADitor to incorporate Getty AAT identifiers into EAD finding aids.  After all, XForms applications communicate nicely with other REST services (such as SPARQL), and I've already done SPARQL query work in XForms with Nomisma's backend.  I spent about a half hour this afternoon improving the Genreform functionality in EADitor to make AAT (as opposed to the Library of Congress Genre/Format Terms) as the default lookup mechanism.

Here's how it works:

User Interface

  1. Add a genreform element into your controlled access headings in your EAD finding aid.
  2. Click the Getty AAT radio button (selected by default) to activate the query interface.
  3. Type a term and click the search button.
  4. A list of results (limited to 25, filtered by English labels, and arranged alphabetically) will appear in the select list.  After clicking an option, click the "Select" button to set the text of the genreform node to the skos:prefLabel from the Getty SPARQL results and to set the @authfilenumber attribute of the genreform element to the Getty id.

Under the Hood

Clicking on the search button does two things: First it replaces 'SEARCH_QUERY' in the SPARQL query, below, with search text in the XForms input. Then it sends an XForms submission with the following action:{encode-for-uri(instance('sparqlQuery'))}&format=xml.

SELECT ?c ?label WHERE {
?c rdf:type gvp:Concept .
?c skos:prefLabel ?label
FILTER langMatches(lang(?label), "en") .
FILTER regex(?label, "SEARCH_QUERY", "i") .
ORDER BY ASC(?label)

Assume that the query above includes the necessary SKOS and GVP prefixes. The options in the select box in the user interface are supplied by the SPARQL XML results.  You can see the code here.

What's it do?

Other than being an excellent controlled vocabulary source and universally recognized system of identifiers, incorporating Getty AAT ids into finding aids created with EADitor opens the door to the aggregation of content (in a useful way) in other large systems.

EADitor's flickr integration enables the injection of Getty-based machine tags into photo metadata.  AAT URIs are treated as dcterms:format in RDF serializations.  While the Digital Public Library of America doesn't yet make use of linked open data identifiers, it is on their agenda.  Therefore, finding aids which incorporate AAT identifiers, in addition to VIAF, Geonames, and LCSH ids will be among the most useful to researchers, since these are the most easily categorized and filtered in a large information system, such as DPLA.

Improving Date Functionality

The default EAD templates  in EADitor have been updated to require the @normal attribute for the encoding of dates, and I have finally gotten around to improving the interface for entering in standard ISO-compliant dates (and automatically generating human-readable text).  This will ultimately improve the finding aids created in EADitor by making contents sortable by creation date.

When inserting a date or unit date anywhere in the document, the user may select the Date or Date Range radio button to display the associated data inputs.  These values (and the machine-generated human readable text) are not inserted into the finding aid until they are valid.  So therefore, the Date, From Date, and To Date must conform to the xs:date (yyyy-mm-dd), xs:gYearMonth (yyyy-mm) or xs:gYear (yyyy) formats.  Furthermore, the To Date must be a greater value than the From Date.  This is a small step, but it should have a great impact on the usefulness of the data with respect to querying.