Friday, May 30, 2014

Enabling a SPARQL endpoint in EADitor and xEAC

Not long ago, I discussed the enhancement of both xEAC and EADitor by connecting them through a SPARQL endpoint. I have extended this further by enabling a wrapper to this endpoint in both xEAC and EADitor. It didn't take long to employ, as I basically copied and pasted some files/code from the nomisma.org Github repository.

Essentially, once SPARQL endpoint URLs have been inputted into the config for xEAC or EADitor, a checkbox is made available to enable a wrapper to the endpoint. By wrapper, I mean a pipeline is created for the SPARQL query interface and the query response. This wrapper interacts with the SPARQL endpoint directly, but the xEAC's/EADitors SPARQL page carries the same style as the rest of the site, and if HTML results are selected, the SPARQL XML response is fashioned through an XSLT stylesheet in xEAC/EADitor into HTML which also conforms to the inherent style of the application. It's essentially identical to the functionality on http://kerameikos.org/sparql: the page and query response are merely interfaces to Fuseki's endpoint.


Why is the SPARQL endpoint useful?

SPARQL is a complicated beast, and I do plan on writing documentation on the ontologies used and models implemented in both xEAC and EADitor. Most users will likely not use the SPARQL query endpoint directly. But the major point is: it exists for a small subset of users that want to perform really sophisticated queries on the dataset. In the same vein, xEAC will also eventually expose an XQuery interface for performing different types of complex queries on the dataset.

The real advantage: building UIs on SPARQL

As demonstrated in a variety of other projects that I work on: the best uses of SPARQL are the ones you don't even realize. On http://kerameikos.org/id/red_figure, the timeline/map, list of thumbnails, and chart (if generated) showing the distribution of a particular typology are all generated by SPARQL queries and rendered into something that is far more visually understandable to a human being. Likewise with a chart showing the change in weight of Roman imperial denarii from the start of the Roman Empire in 27 B.C. to about A.D. 220 (the end point in which we currently have data): http://numismatics.org/ocre/visualize?measurement=weight&chartType=line&interval=5&fromDate=-30&toDate=220&sparqlQuery=nm%3Adenomination+%3Chttp%3A%2F%2Fnomisma.org%2Fid%2Fdenarius%3E#measurements .

While there is still much work to do in refining the ontologies and models used for representing EAC-CPF or EAD records as RDF, now that the SPARQL publication mechanism actually functions, it will be possible to begin to build more sophisticated visualizations on top of these queries--to incorporate social network graph visualizations into xEAC, dynamically generated by SPARQL and easily manipulated and navigated by users.

The first step, however, is to be able to generate lists of related resources, such as a test finding aid linked to Edward T. Newell, below:


 
We plan to deploy xEAC and the current version of EADitor into production at the American Numismatic Society fairly soon for Archer 2.0. While the ANS' archives are fairly small, I think you can get the idea of the potential for a large collection of entity records, like SNAC, when you are able to link millions of entities to many tens of millions of related materials.

Similarly, in EADitor, when a finding aid has been linked to an entity in xEAC (and the @type of the persname, famname, or corpname has been set to xeac:entity [automatically done in the editing interface]), EADitor will extract biographical information directly from the EAC-CPF record:


No comments:

Post a Comment