Wednesday, March 26, 2014

Serializing EAC-CPF into CIDOC CRM

xEAC supports a fairly rudimentary RDF/XML output by appending '.rdf' onto a URI for an entity. There is an RDF ontology based on EAC-CPF, but I am not sure it has seen wide usage (it will eventually be implemented in xEAC, regardless). The RDF model employed in xEAC out of the box is little more than a proof of concept, a placeholder until a more standard model emerges from the archival community. It is based slightly on Aaron Rubinstein's arch ontology and contains little more than labels for name entries, relations (from CPF relations that contain an RDF predicate in the @xlink:arcole), and a dcterms:abstract derived from the EAC-CPF abstract element.

There has been some use of CIDOC CRM to model people. Much of this work has been done by Michele Pasin and John Bradley at King's College London (see their paper). I am heading to London next week for the first meeting of the Standards for Networking Ancient Prosopographies project, and I suspect I will hear much more about their work in this regard there. In order to reach the broadest audience, I am making EAC-CPF data available through xEAC, serialized into CIDOC CRM. This is no easy task, but I have gotten the ball rolling a little bit. I will make more progress once I learn more about the model at the SNAP meeting.

The great advantage of the CRM is that since it is very generalizable, it can be used to model anything. This is a double edged sword, however, since it can be so generalizable that a complicated model is sometimes necessary to communicate a relatively simple concept.

Exist Dates

In EAC-CPF, the date range of existence occupies about four lines of XML, and the @standardDate, @notBefore, and @notAfter attributes communicate ISO standard dates and some semantic certaintly (or uncertainty). These can be modeled in CRM, but in a more complicated fashion. First, a person (or family or organization) was P92i_was_brought_into_existence_by an E63_Beginning_of_Existence which P4_has_time-span designated by a E52_Time-Span which has a human-readable rdfs:label and machine readable P82a_begin_of_the_begin and P82a_end_of_the_begin. @notBefore is P82a_begin_of_the_begin and @notAfter is P82a_end_of_the_begin. That's for a eac:fromDate. An eac:toDate has all of this stuff in a slightly different manner, with End_of_Existence and begin_of_the_end and end_of_the_end. The creation and end of an event can have a place as well, but there are some difficulties in translating birth and death places from EAC-CPF into CRM in this regard.

First, the semantic of an exist date is a bit fuzzy. By definition the existDates are "The dates of existence of the entity being described, such as dates of establishment and dissolution for corporate bodies and dates of birth and death or flourit for persons." The only way to determine between the birth and death dates of a person and the flourit is by using the localType attribute, and the values of @localType may vary from project to project. Therefore, if the entity being described is a person and the existDates are of his or her birth and death, then I should be using properties related to E67_Birth instead of the more generic E63_Beginning_of_Existence (of which E67_Birth is a subclass). Instead, I must opt for the more generic class. The same goes for organizations. Ultimately, the solution to this problem is to implement in the xEAC editing interface a checkbox for inserting a @localType designating whether the existDates are of the life or flourit of the person or organization (e.g., @localType='xeac:birth' or 'xeac:death'). The same goes for the place of birth or death. That way the XSLT stylesheets can read the system-based localType attribute and construct the CIDOC CRM model accordingly, and allow for variation between the exist dates for persons or corporate bodies.

This is something I will continue to wrestle with over the coming weeks, but eventually I hope to have a fully compatible crosswalk between EAC-CPF and both CIDOC CRM and TEI. CRM includes properties for relating children with parents, but arguably these types of relationships should be maintained in a separate ontology build specifically for relations. In fact, there could be many relationship ontologies, depending on the needs of the project. This I am sure will be a topic of discussion at SNAP.

Resources

No comments:

Post a Comment