Friday, December 2, 2011

EADitor December 2011 (.1112) beta + Improved Documentation

There are two major updates to report:

1.  The latest beta packages have been released to the Google Code download section.

Features (apart from creating, editing, publishing, and deleting EAD finding aids):

  • Public interface with faceted search results and facet-based OpenLayers mapping 
  • Linked data and geographic services: OAI-PMH feed, Solr-based Atom feed (embedded with geographic points) and search results in the form of KML
  • Geonames, LCSH, VIAF APIs for geographic, subject term, personal, and corporate name controlled vocabulary
  • Upload finding aids from the "wild" (if they adhere to EAD 2002).
  • Interface for reordering and setting permissions of components
  • Flickr API integration, attach flickr images as a daogrp
  • Simple template controls for EAD finding aids
  • Introduction of simple themes: select facet orientation on search page and from a selection of jQuery UI themes (theme controls will be enhanced over time)

2.  There is a new documentation wiki: http://wiki.numismatics.org/eaditor:eaditor.  Documenting EADitor will be a gradual process; I plan to devote several hours per week to the task.  The wiki will include instructions for installation, use, and eventually, customization (including Fedora and DSpace integration).

Installation Instructions
Generic installation instructions can be found here. Ubuntu-specific instructions can also be found on the wiki.

Friday, November 4, 2011

American Numismatic Society Unveils ARCHER: A Breakthrough in Archival Cataloging

New York, NY - American Numismatic Society today unveiled the new interactive ANS archival search tool, ARCHER (numismatics.org/archives/). ARCHER (ARCHival Electronic Resource) provides access to the ANS’ unrivalled collection of unique archival material, through a series of simple search screens.

ARCHER is powered by EADitor, open source software developed by the ANS for creating, managing, and publishing collections of archival finding aids described in Encoded Archival Description (EAD) XML. Like the society’s Collection database, MANTIS, ARCHER's public interface includes faceted searching and mapping of the collection. The administrative interface takes advantage of XForms, next-generation web forms, which enables the Society’s archivist, David Hill, to create the finding aids with an intuitive user interface that requires no technical knowledge of XML, a major advancement over many institutions' current workflows.

“The Society’s archives contains a rich diversity of material”, notes Deputy Director Andrew Meadows. “On the one hand there are the Society’s own archival records, spanning more than 150 years of our activity since 1858. As such, the archive is one of the oldest held by any learned society in the country. But in addition to that, the ANS also holds a remarkable archive of international significance, consisting of the papers of famous scholars, collectors and dealers in the field of numismatics. Documents, letters, notebooks and photographs relating to many famous individuals from within the numismatic community and beyond are held at the ANS. ARCHER provides a remarkably straightforward tool both for their cataloging, and for their search by the general public.”

The search interface is built on similar technology to the Society’s MANTIS database. Simple faceted searching based on personal names, places, dates, genre and Library of Congress subject headings are all provided. ARCHER also provides a map interface, which will allow users to visualize the places on the globe with relevant ANS archival material.

ANS Archivist David Hill explains, “What really sets ARCHER apart from other archives management systems is that it combines a simple method for the creation of full EAD with a powerful, built-in publishing feature that produces remarkably sophisticated finding aids. The faceted search capabilities give us an enhanced level of control over our archival holdings, ensuring that ANS staff and researchers can always find relevant materials from across our various collections.”

The design of ARCHER is the work of ANS Web Developer Ethan Gruber, working in close collaboration with David Hill. The database supports a variety of export formats that will encourage exploration of the links between numismatics and other disciplines. As with MANTIS, underlying this work is a technical approach called ‘Linked Open Data’. For the future this will mean increased opportunities both for the ANS to integrate searches across the whole range of its collections, such as books, coins, archives, and also for the rest of the world wide web to discover and link to our material. As Meadows notes, “The new suite of ANS search tools, comprising DONUM (the library catalog), MANTIS and ARCHER, will bring numismatic material to whole new audiences. In this the ANS is leading the world.”

For more information, contact ANS Deputy Director Andrew Meadows (212) 571-4470 ext. 111, meadows@numismatics.org.

The American Numismatic Society, organized in 1858 and incorporated in 1865 in New York State, operates as a research museum under Section 501(c)(3) of the Internal Revenue Code and is recognized as a publicly supported organization under section 170(b)(1)(A)(vi) as confirmed on November 1, 1970.

Bringing the ANS archives to a larger audience

Just a quick update on recent happenings...

Over the last several days, I have been working on creating an OAI-PMH service layer in Archer, the ANS's implementation of EADitor.  It is complete and passed all of the OAI validation tests a few minutes ago.  Openarchives.org has successfully registered it.  Like the Atom feed, the OAI service is generated dynamically from Solr search results.  Archer's OAI service will be added to the EADitor general distribution trunk soon, so you'll see it in the next beta release (coming by the end of the year)!

Thursday, October 20, 2011

Mapping Archival Collections via Atom

As mentioned in a previous blog post, EADitor enables georeferencing of archival collections, either as a whole or perhaps on the component level, by tapping into APIs provided by geonames.org.  A mapping interface solicits user input of facets to query EADitor's built-in Solr-to-KML service to render results in the form of an OpenLayers map.

Currently, only one collection in the American Numismatic Society Archives has been georeferenced, but we expect to add more geographic locations in the coming weeks and months as we continue to refine the collections' descriptions.  The more places that have been added into the Solr index, the more useful the Ajax-driven facet querying interface will be.

On top of this, GML points have been added into the Atom feed for the collection.  Like KML, the Atom feed is driven by Solr and can be queried with the Lucene syntax.  Google Maps is capable of rendering an Atom feed which contains GML points, which you can see here.

The ANS code is about a week ahead of what exists in the Google Code trunk for the EADitor general distribution, but I will work on integrating these new mapping features into the trunk before the beta release that should come in early November.

Thursday, September 29, 2011

A Brief Description of Recent Updates

The American Numismatic Society's finding aid collection (an ANS implementation of EADitor) is tentatively set to be released in late October, and I am planning a major release of the EADitor general distribution beta to coincide with this event.  Over the last few weeks, I have been making some improvements to the code, several of which I will discuss briefly in this post.  A number of these improvements have been borrowed from Numishare, which is our software for managing and publishing collections of coins and similar objects.  Like EADitor, Numishare's back-end is built on top of XForms and the server-side XForms processor, Orbeon, so much of the code is interchangeable.

The list:
  1. A major effort to remove hard-coded URLs to eXist and Solr, placing them, instead, into configuration files that can be edited by an XForms web form.
  2. Introduction of themes configured through a web form.  Though rudimentary so far, they are documented here.  Currently, the user can select from a list of available jQuery UI themes and select the layout of the search results page: facet list in a left or right-aligned column.
  3. Flickr API integration.
  4. The introduction of a new Solr core for storing docs of every finding aid in eXist (including those not published), so that the file list page in the admin section loads faster and enables full-text searching of the collection.  I will eventually develop methods for more advanced searching and sorting of records in the admin page.  Each time the file list page is loaded, it verifies the number of eXist documents is the same as Solr documents, so that if finding aids are added/removed with the eXist client, the Solr index is flushed and the collection reindexed.
  5. Batch publication of all finding aids to the public Solr index and batch removal of all documents from this index.
  6. The ability to edit a container-type template.
  7. Significant improvements in performance due to rewriting of some XBL components in which a glitch in my code resulting in commands being fired off numerous times, and other general bug fixes.
There aren't many more improvements or features that I would like to make to the code before the next major release in several weeks, but I would like to begin writing documentation for use and testing with Fedora repositories.






Screenshot showing "sunny" jQuery UI theme and a thumbnail served from flickr.  The banner text "EADitor: XForms for EAD" is stored in the configuration file and editable in the admin section.

Monday, August 22, 2011

EADitor featured in SAA Description Expo 2011

EADitor was recently honored by making the short list of archive and EAD-related projects at this year's Description Expo of the Society of American Archivists.

Read more about the projects!

Monday, June 27, 2011

Permissions Editing Simplified

EAD allows for an audience attribute to be set for any element within the schema. There are two options to select for the audience, internal and external, used to designate user permission to the content. By default, the lack of the audience attribute signifies that content of the element (and its descendants) is open to the public. Sometimes it is necessary for a finding aid to contain components for internal use, but are not intended to be seen by end users. Perhaps an entire series within an archival collection contains private materials--financial or membership records--but within this series are a few items which may be publicly highlighted.

EADitor now enables the editing of any of the attributes for components, including the setting of audiences.


The attribute popup pictured above is shown upon clicking the "@" link adjacent to the "Edit Component" heading at the top of the image. This link is also visible next to each EAD component under the Subcomponents tab. This popup window is an XBL component, and can very easily be applied to any element within the XForms application. The XBL file parses the EAD 2002 schema to look up the name of the element and extract all attributes and attribute groups associated with the element to dynamically populate the window (see code here).

Moreover, there is now a specific interface for editing component permissions, accessible from the administrative page (code):

Components with their audience attribute set to "internal" appear orange, while "external" ones appear green. Components that lack the audience attribute inherit the audience of their parent, which, in the case of top-level series, are visible by default.

The XSLT stylesheets to render finding aids into public XML and HTML documents were modified to put these permissions into practice. The stylesheets packaged with the EADitor general distribution will show only the unit title of an internal component if it contains an external descendant; otherwise, internal components are hidden entirely. These stylesheets can be modified fairly easily to show containers, unit dates, or other desired elements.


Ultimately, EADitor should accommodate the modification of the audience for any element in EAD--not just components as a whole--but such a feature is a delicate balancing act between usability and function. Setting the permissions at a more granular level should not interfere with the ease of the user interface. Implementation of more sophisticated access controls is ultimately dependent on user demand.

Thursday, June 9, 2011

Improving Authority Control in EADitor

Having just returned Sunday from LOD-LAM and gotten pumped up on linked open data and learned about all sorts of cool stuff, I was eager Monday to get started on improving authority control in EADitor (much of which will later be applied to the Numishare general distribution). I learned of VIAF, an OCLC corporate and personal name authority service, and set off to integrate this service into EADitor's XBL components in the same way that one searches geonames for the EAD geogname element or id.loc.gov for LCSH terms for subject.



I was able to get this working in just a few short hours the other day thanks to VIAF's well-documented APIs. Like geonames, the EADitor user can still select arbitrary terms from local authority lists or input new names, but searching VIAF will go a long way toward tightening controlled access terms in EAD finding aids. Moreover, it will enable direct linking to various services from the finding aid display page itself, and those services often provide external links to other useful sites, like wikipedia or worldcat.


The links to terms listed above point the user to EADitor's search results page for that particular query. The image to the right of the term links the user to the URI for that term, hosted by the appropriate service, e.g., http://www.geonames.org/4811020/ for Kirby (W.Va.) or http://viaf.org/viaf/126085739/ for the American Numismatic Society. Also of note here is another significant change. The results from geonames are transformed into AACR2-compliant place names, with the exception of Malaysia since I have not yet been able to find a list of standard abbreviations for its territories.

I see also another use for controlled vocabulary service integration. Third parties will be able to query the EAD collection for finding aids which contain particular terms. For example, suppose that from the VIAF record for Thomas Jefferson (http://viaf.org/viaf/41866059/) one can link back to an institution's search results page for finding aids containing @authfilenumber '41866059' from @source 'viaf'.

After all, better metadata is the foundation for better information systems.

NOTE the code is available below:

http://code.google.com/p/eaditor/source/browse/trunk/xbl/eaditor/corpname/corpname.xbl
http://code.google.com/p/eaditor/source/browse/trunk/xbl/eaditor/geogname/geogname.xbl
http://code.google.com/p/eaditor/source/browse/trunk/xbl/eaditor/persname/persname.xbl


Thursday, May 26, 2011

Towards georeferencing archival collections

One of the most effective ways to associate objects in archival collections with related objects is with controlled access terms: personal, corporate, and family names; places; subjects. These associations are meaningless if chosen arbitrarily. With respect to machine processing, Thomas Jefferson and Jefferson, Thomas are not seen as the same individual when judging by the textual string alone. While EADitor has incorporated authorized headings from LCSH and local vocabulary (scraped from terms found in EAD files currently in the eXist database) almost since its inception, it has not until recently interacted with other controlled vocabulary services. Interacting with EAC-CPF and geographical services is high on the development priority list.

geonames.org


Over the last week, I have been working on incorporating geonames.org queries into the XForms application. Geonames provides stable URIs for more than 7.5 million place names internationally. XML representations of each place are accessible through various REST APIs. These XML datastreams also include the latitude and longitude, which will make it possible to georeference archival collections as a whole or individual items within collections (an item-level indexing strategy will be offered in EADitor as an alternative to traditional, collection-based indexing soon).

The new interface in EADitor enables both the querying of geonames or the selection of user-entered terms as part of a localized controlled vocabulary, which is driven by autosuggest and Solr TermsComponent (identical to the LCSH functionality).



In the above picture is an example of the two interfaces. The first enables the user to query the place name. Clicking on the "Search" button sends an xforms:submission to geonames.org's API to return applicable places in a list. Selection of one of the places will populate the element with the name, set the @source to "geonames" and set the @authfilenumber to the appropriate geonameId returned from the query. Storing the geonameId enables further query of geonames APIs to gather the geographic coordinates of the place and any changes to the place names upon indexing the finding aid into Solr.


EADitor now includes a KML querying service, similar to Numishare's, which is detailed on the numishare blog. With OpenLayers, a KML representation of the current search is rendered in the form of a map, pictured above. The mapping component within EADitor is currently simple, but can be expanded in the future. Ultimately, a map-driven query interface like that of MANTIS can be integrated into the application with fairly minimal effort (since both EADitor and Numishare are XSLT/Solr/Ajax driven).

So how do you get this new functionality in your already-installed .1105 beta?

  1. Following the subversion update instructions here.
  2. Since Solr configuration files have changed, restart Apache Tomcat
  3. Register a free user account at geonames.org to be able to submit API queries 30,000 times per day. The EADitor code uses the "demo" username to query geonames.org, which provides only limited access to the data. Edit geogname.xbl, and replace "demo" with your registered username. This file is located at TOMCAT_HOME/webapps/orbeon/WEB-INF/resources/xbl/eaditor/geogname/geogname.xbl. You will have to replace it twice within the xforms:submissions near the bottom of the file.
With these steps complete you will be able to begin georeferencing your archival collections!

Friday, May 13, 2011

EADitor beta .1105 released

I'm pleased to announce a new, much overdue, EADitor beta, .1105.

EADitor is an XForms framework for the creation and editing of Encoded Archival Description (EAD) finding aids using Orbeon, an enterprise-level XForms Java application, which runs in Apache Tomcat. Although the web form is certainly the most important aspect of the application since it can be integrated with existing content management and dissemination systems, EADitor also includes an easily customizable public interface for searching, sorting, and browsing collections of finding aids. This enables institutions to use a single application for content creation and publication.

FEATURES
  • Create and edit EAD finding aids adhering to the EAD 2002 schema (elements are represented at almost every level in the finding aid, with the notable exception of mixed content at the paragraph level).
  • Import EAD 2002 schema or DTD-compliant finding aids into EADitor
  • An administrative user interface for publishing/unpublishing finding aids
  • Simple component reordering interface
  • Controlled vocabulary integration with auto-suggest, including LCSH terms and local vocabularies in subject, persname, famname, corpname, geogname, and genreform. Languages refer to controlled vocabulary also.
  • Set default templates for the EAD core and components
  • A form for setting agency codes
  • Public interface for searching, browsing, and viewing finding aids (based on Solr).
  • Atom feed for published finding aids

EADitor is still a work in progress, but will advance more consistently now that it is officially supported by the American Numismatic Society. Ultimately, I would like to integrate other controlled vocabulary services. One of the most important issues to address moving forward is better documentation.

MORE INFORMATION

EADitor project site (Google Code): http://code.google.com/p/eaditor/
Installation instructions (specific for Ubuntu but broadly applies to all Unix-based systems): http://code.google.com/p/eaditor/wiki/UbuntuInstallation
Google Group: http://groups.google.com/group/eaditor