Thursday, January 28, 2016

SPARQL-based social network graph in xEAC

I pushed into production a new SPARQL-based social network graph feature in xEAC. The most interesting places to start are and, but we have a lot of work to do to enhance the linkage between our authorities in order to make these visualizations more useful in the future.

Nearly a year ago, I began implementing a new EAC-CPF to RDF data model that could represent a graph of relationships in order to begin experimenting with rendering a social network graph in real time. After investigating the open source Javascript graph visualization tools, I choose vis.js, as it was powerful, easy to use, and could load JSON on the fly. I got a very basic graph working a year ago in time for Moving People, Linking Lives at the University of Virginia, but it wasn't interactive, in that you could not expand beyond the first level of nodes connected to the authority record you were immediately viewing.

After launching our first EBook a few weeks ago in ETDPub (which is integrated with our production installation of xEAC), I decided to revisit xEAC development of the social network graph interface.

The Model

The RDF model implements bits and pieces of various standard ontologies. People, corporate bodies, and families have separate URIs for their entity represented as a Concept and as a Thing. The Concept (skos:Concept) of a person can be linked to concepts of that person in other vocabulary systems, like the Getty ULAN, VIAF, Wikidata, or SNAC. This is also the data object where you may also include provenance about the creation and modification of the object record. For example, dcterms:created applied for a foaf:Person would imply that the person was born on the given date, but when used in a skos:Concept, this implies that the concept data object would have been created in the data system at the given date.

The Concept object is connected to the Thing object with foaf:focus.

The Thing object contains mainly biographical information, using the bio ontology. While much work remains to be done to link individuals to events, basic birth and death dates are represented, as well as a string of bio:relationships. Each bio:Relationship object contains a property defining the nature of the relationship and the target entity of the relationship. I will probably revisit the properties by which people are linked to organizations (using the org ontology more properly), but the model does function well enough to generate a graph of relationships.


Vis.js renders two JSON models, one for nodes and the other for edges, into a visual graph following HTML5 standards. Essentially, I had to build two web services in xEAC that would deliver these JSON models that could be read in real time via Ajax. The underlying model for these services is the SPARQL query, and the views are generated with two different XSLT stylesheets to generate the JSON that vis.js requires to render the graph. The query is this:

SELECT ?sourceName ?type ?target ?name ?class WHERE {
 <URI> foaf:name ?sourceName ;
       bio:relationship ?rel .
  ?rel xeac:relationshipType ?type ;
       bio:participant ?target .
  ?target foaf:name ?name ;
          a ?class

Essentially, we get all of the relationships connected to a particular entity (URI), the type of relationship (e.g., rel:spouseOf), and the target entity, whether another URI in the system or a blank node. The SPARQL response is processed and serialized into JSON. When clicking on connecting nodes in the graph visualization--if the target node is not a blank node RDF object (therefore, another authority in xEAC)--vis.js fires off another Ajax call to create new nodes and edges. Arrows in the graph visualization indicate the directionality of the relationship.

I should say that this is just the first phase of social network graph visualization in xEAC. While I have focused mainly on visualizing relationships on the level of the individual authority, my goal is to expand the application to implement a more sophisticated query interface that allows users to select arbitrary parameters to generate their own visualizations. For example, a user may want to view all persons grouped together by family or corporate body. Or group people by occupation or filter by date or place. All of these things are possible by reconceptualizing EAC-CPF into RDF graphs and developing the SPARQL queries that can be rendered into JSON for vis.js.

Wednesday, January 13, 2016

Survey to help usability testing

I have created a short questionnaire in a Google form to aid in usability testing for our TEI -> EPUB serialization. It is available at

You can download the EPUB file for the Miller Medallic Arts of the American Numismatic Society book here.

First EBook published to ANS Digital Library

This afternoon, we have published our first EBook to the ANS Digital Library in the ETDPub framework. This EBook, Medallic Art of the American Numismatic Society, 1865–2014 by Scott Miller, is encoded in TEI and has been issued with a Creative Commons BY-NC license. While the TEI file has not been fully linked into name and place authority files, I was able to use regex to link to medals in the American Numismatic Society collection and link one prominent scholar, Edward T. Newell, to our archival authority record. The TEI file will be fully linked up later, but the publication of this EBook can be seen as a completely functional demonstration of the technical application of linked open data principles to publishing these types of books for the larger NEH-Mellon Humanities Open Book project.

The TEI is indexed into Solr by ETDPub for full-text search, but this is only the beginning of this system's features. Using Orbeon's XPL pipelines, we are able to cobble together a series of XSLT transformations of the TEI file into XHTML and other XML files (NCX, OPF) required by the EPUB 3.0.1 specification. There is a link to the EPUB download on the page for the EBook, and the EPUB file is generated dynamically. I have tested on an ereader application on my desktop (Ubuntu) and a few on my Android phone. They mostly seem to work well, but the table of contents isn't consistently functional, but this seems to be more of an issue of the individual app not supporting EPUB 3.0.1 correctly rather than the EPUB file itself. I plan to put together a survey to assist in usability testing.

It is also important to note that the focus with EPUB serialization so far has been almost solely on functionality. The XSLT stylesheets are very basic and I have applied almost no CSS styling, but there is potential in enhancing the overall aesthetic of the document. That will come later as functional issues are ironed out. I am aware the tables do not seem to render properly.

The other major feature of ETDPub's TEI publishing is serialization into RDF (so far, XML, but JSON-LD and Turtle outputs are coming). This RDF is fairly rudimentary so far. The RDF contains a data object for the book as a whole (and associated metadata in dcterms, like creator and publisher) and for each child div, using dcterms:isPartOf to link the hierarchical structure of the book together. Furthermore, any link (ref element) within the lowest level relevant div and any name that has been linked via the @corresp attribute to an authoritative URI in the teiHeader will be rendered as an annotation following the Open Annotation model. ETPub is capable of executing CRUD operations with an SPARQL 1.1-compliant endpoint, and so the Digital Library application is posting into the triplestore that links our archival objects and authority records together. I have previously discussed linking xEAC and EADitor together via SPARQL, and now ETPub is capable of doing the same. The authority record for Newell now includes links to the sections in the ANS medals book in which he was mentioned, in addition to the research notebooks and photographs associated with Newell that are contained in the ANS archives. We are moving forward with linking our library, archives, and collection more closely together internally, as well as paving the way for scholars to gain further context with data sources outside the ANS.