Thursday, May 26, 2011

Towards georeferencing archival collections

One of the most effective ways to associate objects in archival collections with related objects is with controlled access terms: personal, corporate, and family names; places; subjects. These associations are meaningless if chosen arbitrarily. With respect to machine processing, Thomas Jefferson and Jefferson, Thomas are not seen as the same individual when judging by the textual string alone. While EADitor has incorporated authorized headings from LCSH and local vocabulary (scraped from terms found in EAD files currently in the eXist database) almost since its inception, it has not until recently interacted with other controlled vocabulary services. Interacting with EAC-CPF and geographical services is high on the development priority list.

Over the last week, I have been working on incorporating queries into the XForms application. Geonames provides stable URIs for more than 7.5 million place names internationally. XML representations of each place are accessible through various REST APIs. These XML datastreams also include the latitude and longitude, which will make it possible to georeference archival collections as a whole or individual items within collections (an item-level indexing strategy will be offered in EADitor as an alternative to traditional, collection-based indexing soon).

The new interface in EADitor enables both the querying of geonames or the selection of user-entered terms as part of a localized controlled vocabulary, which is driven by autosuggest and Solr TermsComponent (identical to the LCSH functionality).

In the above picture is an example of the two interfaces. The first enables the user to query the place name. Clicking on the "Search" button sends an xforms:submission to's API to return applicable places in a list. Selection of one of the places will populate the element with the name, set the @source to "geonames" and set the @authfilenumber to the appropriate geonameId returned from the query. Storing the geonameId enables further query of geonames APIs to gather the geographic coordinates of the place and any changes to the place names upon indexing the finding aid into Solr.

EADitor now includes a KML querying service, similar to Numishare's, which is detailed on the numishare blog. With OpenLayers, a KML representation of the current search is rendered in the form of a map, pictured above. The mapping component within EADitor is currently simple, but can be expanded in the future. Ultimately, a map-driven query interface like that of MANTIS can be integrated into the application with fairly minimal effort (since both EADitor and Numishare are XSLT/Solr/Ajax driven).

So how do you get this new functionality in your already-installed .1105 beta?

  1. Following the subversion update instructions here.
  2. Since Solr configuration files have changed, restart Apache Tomcat
  3. Register a free user account at to be able to submit API queries 30,000 times per day. The EADitor code uses the "demo" username to query, which provides only limited access to the data. Edit geogname.xbl, and replace "demo" with your registered username. This file is located at TOMCAT_HOME/webapps/orbeon/WEB-INF/resources/xbl/eaditor/geogname/geogname.xbl. You will have to replace it twice within the xforms:submissions near the bottom of the file.
With these steps complete you will be able to begin georeferencing your archival collections!

No comments:

Post a Comment