Thursday, May 26, 2011

Towards georeferencing archival collections

One of the most effective ways to associate objects in archival collections with related objects is with controlled access terms: personal, corporate, and family names; places; subjects. These associations are meaningless if chosen arbitrarily. With respect to machine processing, Thomas Jefferson and Jefferson, Thomas are not seen as the same individual when judging by the textual string alone. While EADitor has incorporated authorized headings from LCSH and local vocabulary (scraped from terms found in EAD files currently in the eXist database) almost since its inception, it has not until recently interacted with other controlled vocabulary services. Interacting with EAC-CPF and geographical services is high on the development priority list.

geonames.org


Over the last week, I have been working on incorporating geonames.org queries into the XForms application. Geonames provides stable URIs for more than 7.5 million place names internationally. XML representations of each place are accessible through various REST APIs. These XML datastreams also include the latitude and longitude, which will make it possible to georeference archival collections as a whole or individual items within collections (an item-level indexing strategy will be offered in EADitor as an alternative to traditional, collection-based indexing soon).

The new interface in EADitor enables both the querying of geonames or the selection of user-entered terms as part of a localized controlled vocabulary, which is driven by autosuggest and Solr TermsComponent (identical to the LCSH functionality).



In the above picture is an example of the two interfaces. The first enables the user to query the place name. Clicking on the "Search" button sends an xforms:submission to geonames.org's API to return applicable places in a list. Selection of one of the places will populate the element with the name, set the @source to "geonames" and set the @authfilenumber to the appropriate geonameId returned from the query. Storing the geonameId enables further query of geonames APIs to gather the geographic coordinates of the place and any changes to the place names upon indexing the finding aid into Solr.


EADitor now includes a KML querying service, similar to Numishare's, which is detailed on the numishare blog. With OpenLayers, a KML representation of the current search is rendered in the form of a map, pictured above. The mapping component within EADitor is currently simple, but can be expanded in the future. Ultimately, a map-driven query interface like that of MANTIS can be integrated into the application with fairly minimal effort (since both EADitor and Numishare are XSLT/Solr/Ajax driven).

So how do you get this new functionality in your already-installed .1105 beta?

  1. Following the subversion update instructions here.
  2. Since Solr configuration files have changed, restart Apache Tomcat
  3. Register a free user account at geonames.org to be able to submit API queries 30,000 times per day. The EADitor code uses the "demo" username to query geonames.org, which provides only limited access to the data. Edit geogname.xbl, and replace "demo" with your registered username. This file is located at TOMCAT_HOME/webapps/orbeon/WEB-INF/resources/xbl/eaditor/geogname/geogname.xbl. You will have to replace it twice within the xforms:submissions near the bottom of the file.
With these steps complete you will be able to begin georeferencing your archival collections!

Friday, May 13, 2011

EADitor beta .1105 released

I'm pleased to announce a new, much overdue, EADitor beta, .1105.

EADitor is an XForms framework for the creation and editing of Encoded Archival Description (EAD) finding aids using Orbeon, an enterprise-level XForms Java application, which runs in Apache Tomcat. Although the web form is certainly the most important aspect of the application since it can be integrated with existing content management and dissemination systems, EADitor also includes an easily customizable public interface for searching, sorting, and browsing collections of finding aids. This enables institutions to use a single application for content creation and publication.

FEATURES
  • Create and edit EAD finding aids adhering to the EAD 2002 schema (elements are represented at almost every level in the finding aid, with the notable exception of mixed content at the paragraph level).
  • Import EAD 2002 schema or DTD-compliant finding aids into EADitor
  • An administrative user interface for publishing/unpublishing finding aids
  • Simple component reordering interface
  • Controlled vocabulary integration with auto-suggest, including LCSH terms and local vocabularies in subject, persname, famname, corpname, geogname, and genreform. Languages refer to controlled vocabulary also.
  • Set default templates for the EAD core and components
  • A form for setting agency codes
  • Public interface for searching, browsing, and viewing finding aids (based on Solr).
  • Atom feed for published finding aids

EADitor is still a work in progress, but will advance more consistently now that it is officially supported by the American Numismatic Society. Ultimately, I would like to integrate other controlled vocabulary services. One of the most important issues to address moving forward is better documentation.

MORE INFORMATION

EADitor project site (Google Code): http://code.google.com/p/eaditor/
Installation instructions (specific for Ubuntu but broadly applies to all Unix-based systems): http://code.google.com/p/eaditor/wiki/UbuntuInstallation
Google Group: http://groups.google.com/group/eaditor