Thursday, March 27, 2014

Incorporating RDF relationship ontologies into xEAC

Several months ago, just after presenting the latest developments in xEAC at MARAC, I wrote on the application's enhanced relationship maintenance capabilities. The new system required manual entry of relationships into the system. One of the questions I received at MARAC was, basically, will xEAC be able to harvest from existing ontologies? Now, the answer is "yes."

While this is still very much a prototype (because there may be numerous ways of constructing a relationship ontology in RDF), I have successfully implemented an RDF (XML) upload mechanism. The xEAC relationship maintenance section will parse the RDF/XML provided http://vocab.org/relationship/. The XForms processor will read the relationship properties in the file, creating symmetrical or inverse relationships when applicable. It allow you to select the prefix you would like to use to define the ontology and will create the localTypeDeclaration that contains the abbreviation (the prefix) and citation (URI) if it does not already exist in the config.

Therefore, it will take some RDF that looks like this:

<rdf:Description rdf:about="http://purl.org/vocab/relationship/grandchildOf">
 <rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
 <owl:equivalentClass rdf:resource="http://www.perceive.net/schemas/relationship/grandchildOf"/>
 <owl:inverseOf rdf:resource="http://purl.org/vocab/relationship/grandparentOf"/>
 <rdfs:subPropertyOf rdf:resource="http://xmlns.com/foaf/0.1/knows"/>
 <rdfs:subPropertyOf rdf:resource="http://www.w3.org/2002/07/owl#differentFrom"/>
 <rdfs:label xml:lang="en">Grandchild Of</rdfs:label>
 <rdfs:label>Grandchild Of</rdfs:label>
 <skos:definition xml:lang="en">A person who is a child of any of this person's children.</skos:definition>
 <rdfs:domain rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
 <rdfs:range rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
 <rdfs:isDefinedBy rdf:resource="http://purl.org/vocab/relationship/"/>
 <skos:historyNote rdf:nodeID="mor53341b13853b8"/>
</rdf:Description>

And turn it into this:


Once you've saved xEAC's settings, these relationships will be available through the @xlink:arcole in CPF Relations in the EAC-CPF form.


Of course, after you establish a relationship between your source and target person, family, or corporate body record, the target EAC-CPF record will be updated with the symmetrical/inverse relationship which points back to the source. These relationships will be expressed in RDF output generated by xEAC.

Wednesday, March 26, 2014

Serializing EAC-CPF into CIDOC CRM

xEAC supports a fairly rudimentary RDF/XML output by appending '.rdf' onto a URI for an entity. There is an RDF ontology based on EAC-CPF, but I am not sure it has seen wide usage (it will eventually be implemented in xEAC, regardless). The RDF model employed in xEAC out of the box is little more than a proof of concept, a placeholder until a more standard model emerges from the archival community. It is based slightly on Aaron Rubinstein's arch ontology and contains little more than labels for name entries, relations (from CPF relations that contain an RDF predicate in the @xlink:arcole), and a dcterms:abstract derived from the EAC-CPF abstract element.

There has been some use of CIDOC CRM to model people. Much of this work has been done by Michele Pasin and John Bradley at King's College London (see their paper). I am heading to London next week for the first meeting of the Standards for Networking Ancient Prosopographies project, and I suspect I will hear much more about their work in this regard there. In order to reach the broadest audience, I am making EAC-CPF data available through xEAC, serialized into CIDOC CRM. This is no easy task, but I have gotten the ball rolling a little bit. I will make more progress once I learn more about the model at the SNAP meeting.

The great advantage of the CRM is that since it is very generalizable, it can be used to model anything. This is a double edged sword, however, since it can be so generalizable that a complicated model is sometimes necessary to communicate a relatively simple concept.

Exist Dates

In EAC-CPF, the date range of existence occupies about four lines of XML, and the @standardDate, @notBefore, and @notAfter attributes communicate ISO standard dates and some semantic certaintly (or uncertainty). These can be modeled in CRM, but in a more complicated fashion. First, a person (or family or organization) was P92i_was_brought_into_existence_by an E63_Beginning_of_Existence which P4_has_time-span designated by a E52_Time-Span which has a human-readable rdfs:label and machine readable P82a_begin_of_the_begin and P82a_end_of_the_begin. @notBefore is P82a_begin_of_the_begin and @notAfter is P82a_end_of_the_begin. That's for a eac:fromDate. An eac:toDate has all of this stuff in a slightly different manner, with End_of_Existence and begin_of_the_end and end_of_the_end. The creation and end of an event can have a place as well, but there are some difficulties in translating birth and death places from EAC-CPF into CRM in this regard.

First, the semantic of an exist date is a bit fuzzy. By definition the existDates are "The dates of existence of the entity being described, such as dates of establishment and dissolution for corporate bodies and dates of birth and death or flourit for persons." The only way to determine between the birth and death dates of a person and the flourit is by using the localType attribute, and the values of @localType may vary from project to project. Therefore, if the entity being described is a person and the existDates are of his or her birth and death, then I should be using properties related to E67_Birth instead of the more generic E63_Beginning_of_Existence (of which E67_Birth is a subclass). Instead, I must opt for the more generic class. The same goes for organizations. Ultimately, the solution to this problem is to implement in the xEAC editing interface a checkbox for inserting a @localType designating whether the existDates are of the life or flourit of the person or organization (e.g., @localType='xeac:birth' or 'xeac:death'). The same goes for the place of birth or death. That way the XSLT stylesheets can read the system-based localType attribute and construct the CIDOC CRM model accordingly, and allow for variation between the exist dates for persons or corporate bodies.

This is something I will continue to wrestle with over the coming weeks, but eventually I hope to have a fully compatible crosswalk between EAC-CPF and both CIDOC CRM and TEI. CRM includes properties for relating children with parents, but arguably these types of relationships should be maintained in a separate ontology build specifically for relations. In fact, there could be many relationship ontologies, depending on the needs of the project. This I am sure will be a topic of discussion at SNAP.

Resources

Tuesday, March 25, 2014

Exporting EAC-CPF to TEI

As indicated in the TO-DO list in the recent xEAC beta announcement, and as part of the design specifications for our IMLS grant application for the further development of xEAC (and the creation of a prosopographical datset of the Roman Empire), I have implemented a basic EAC-CPF-to-TEI transformation. It isn't yet a complete crosswalk, but it handles the following:

  • name entries
  • biographical description
  • the generation of a chronological list of events (including descriptions with normalized dates and places that link to either Pleiades or Geonames)
  • a list of relations, implementing semantic relationships defined in the @xlink:arcrole of the cpfRelation element.

There is now a link on the HTML page for an entity record to the TEI export. One can access this alternate model by appending '.tei' to the URI, e.g., http://admin.numismatics.org/xeac/id/id/alexander_the_great.tei.

The XSLT stylesheet is available at Github at https://github.com/ewg118/xEAC/blob/master/ui/xslt/serializations/eac/tei.xsl.

The model is based upon the TEI Prosopography documentation and some examples in the Lexicon of Greek Personal Names.

Monday, March 24, 2014

Further AAT Integration into EAC-CPF and EAD

xEAC

Like occupations, the function element in EAC-CPF has also been hooked into the Getty AAT via XForms-based SPARQL queries. The top concept for the function facet in the AAT is http://vocab.getty.edu/aat/300054593. According to Kathleen Roe on the EAD listserv:

The Getty functions vocabulary was built from an analysis of government functions in the U.S. as part of the Government Records Description project (early 1990s) undertaken by 12 or so state archives, including Utah (Jeff Johnson was the state archivist at the time).

 The query appears as follows:

SELECT ?c ?label {
?c a gvp:Concept; skos:inScheme aat: ;
gvp:broaderTransitive aat:300054593 ;
gvp:prefLabelGVP/xl:literalForm ?label ;
luc:term "SEARCH_QUERY*"
} LIMIT 25

 

EADitor

I extended these functionalities to EADitor, with the code practically copied and pasted from the genreform XBL into the function and occupation components (just replacing genreform with occupation/function and updating the SPARQL query). EADitor now supports lookup mechanisms for the following controlled access terms:
  • geogname: Geonames (modern)/Pleiades (ancient)
  • genreform: AAT/LCGFT
  • function: AAT
  • occupation: AAT
  • subject: LCSH
  • persname: VIAF
  • corpname: VIAF



EADitor will eventually hook into SNAC (or whatever evolves from it) for persname, corpname, and famname, and I will extend it to hook into xEAC for linking finding aids with EAC-CPF records. xEAC already has a REST query mechanism that returns results in the form of Atom XML, so this will be pretty easy.

Thursday, March 6, 2014

xEAC beta 2014a ready for testing

I have finally gotten xEAC to a stage where I feel it is ready for wider testing (and I have updated the installation documentation). This has been a few months coming, since I had intended to release the beta shortly after MARAC in November. The xEAC documentation can be found here: http://wiki.numismatics.org/xeac:xeac

Features

  • Create, edit, publish EAC-CPF documents. Most, but not all, EAC-CPF elements are supported.
  • Public user interface migrated to bootstrap 3 to support mobile devices.
  • Maps and timelines for visualization of life events.
  • Basic faceted search and Solr-based Atom feed in the UI.
  • Export in EAC-CPF, KML, and rudimentary RDF/XML. HTML5+RDFa available in entity record pages.
  • Manage semantic relationships between identities (http://eaditor.blogspot.com/2013/11/maintaining-relationships-in-eac-cpf.html). Target records are automatically updated with symmetrical or inverse relationships, where relevant, and relationships are expressed in the RDF output. TODO: parse relationship ontologies defined in RDF (e.g., http://vocab.org/relationship/.rdf) for use in xEAC.

REST interactions


The XForms engine interacts with the following web services to import name authorities, biographical, or geographic information:

When the OCLC linked data service supports queries by VIAF URI, I will create a lookup widget to provide lists of related bibliographic resources.

TODO list

I aim to improve xEAC over the following months and incorporate the following:
  • Finish form: Represent all EAC-CPF elements and attributes
  • Test for scalability
  • Interface with more APIs in the editing interface
  • Employ SPARQL endpoint for more sophisticated querying and visualization, automatically publish to SPARQL on EAC-CPF record save.
  • Improve public interface, especially searching and browsing
  • Incorporate social network graph visualization (see SPARQL, above)
  • Follow evolving best practices in RDF, support export in TEI for prosopographies (http://wiki.tei-c.org/index.php/Prosopography) and CIDOC-CRM.
  • Interact with SNAC or international entity databases which evolve from it.

Wednesday, March 5, 2014

Linking EAC-CPF Occupations to the Getty AAT

The occupation element in xEAC now supports a SPARQL-based lookup mechanism to link EAC-CPF records to terms defined in the newly-released linked open data Getty AAT.

I won't go into great detail about how this works in the back end, because it is basically identical to the process by which I hooked EADitor into the AAT with EAD genreform elements, which I covered in a blog post last month.

One thing to note, however, is that the xEAC occupation lookup filters for terms that contain "Agents Facet" in the gvp:parentStringAbbrev property. There are different categories of terms--object types, agents, stylistic periods, etc.--that are not semantically distinguished, but at least contain a string in a generic field which allows filtering. I hope that the Getty will move forward with a more formal representation of these facets to improve querying efficiency.

Therefore queries for occupations look something like this:

SELECT ?c ?label WHERE {
?c rdf:type gvp:Concept .
?c skos:inScheme aat: .
?c skos:prefLabel ?label .
?c luc:term "president" .
?c gvp:parentStringAbbrev ?facet 
FILTER regex(?facet, "Agents Facet") 
FILTER langMatches(lang(?label), "en")}
ORDER BY ASC(?label)
LIMIT 25

I plan to apply these filters to the LOD thesaurus editor for kerameikos.org in order to provide a more accurate list of style periods, pottery techniques, wares, and shapes for linking kerameikos URIs to Getty AAT identifiers. For example, "Black Figure" is defined by the Getty as both a technique and a style or period, so "Black Figure" on kerameikos, defined by http://kerameikos.org/ontology#Technique, should refer to the Getty's technique facet (not the style or period) for the term with owl:sameAs.

xEAC: Current and Future Work this Month

I am in the process of migrating various projects to Bootstrap 3, which greatly improves mobile support. Numishare's master branch has been migrated to Bootstrap from jQuery UI (with the exception of multiselect, which is on the agenda). I recently completed the migration of xEAC to Bootstrap (including multiselects on the browse page), and EADitor will be next. Now that I have successfully implemented Bootstrap Multiselect, I will be able to apply these changes back to Numishare. Frankly, the AJAX lookup mechanism for dynamic Solr facet terms is much simpler in Bootstrap Multiselect compared to the older jQuery UI one I had been using for three years--far less javascript required on my end.

While I was at it, and since I'm having Orbeon (the engine powering both the front end user interface and the back end editing in both xEAC and EADitor) output pages in HTML5, I went ahead and applied fairly basic RDFa to EAC-CPF record pages so that machine readable data can be extracted by using the W3C distiller.

I will be traveling to London at the end of this month to participate in the Standards for Networking Ancient Prosopographies meeting to discuss EAC-CPF and xEAC to some degree. The meeting consists mainly of digital humanists who have a lot of experience with TEI and CIDOC-CRM, but may be completely unaware of the emergence of EAC-CPF as a LAM standard for modeling entities and their relationships. Since we at the American Numismatic Society are moving forward with our own prosopography of the Roman Empire (which will tie into other projects, such as Online Coins of the Roman Empire and nomisma.org), we aim to contribute our entity URIs into SNAP, which will facilitate larger scale aggregation of cultural heritage materials related to ancient people. In order to broaden access and use of our data, we will not only provide the source EAC-CPF XML documents, but also alternative serializations in various forms of RDF (like CIDOC-CRM) and TEI conforming to the prosopography recommendations. By the end of the month, I plan to have some basic CIDOC-CRM and TEI exports functional, as well as possibly hooking xEAC up to a RDF triplestore/SPARQL endpoint as a proof of concept of publishing EAC-CPF as linked open data right out of the box.