XForms for Archives

Tuesday, September 1, 2020

First pass mapping EAC-CPF to Linked Art JSON-LD

After pushing updates to map people and organization concepts in Nomisma.org in Linked Art-compliant JSON, I have implemented a similar serialization into xEAC, the open source authority management framework, based on EAC-CPF, that I have been developing on and off since 2012.

Like Nomisma and Numishare projects, an HTTP request for an authority URI includes Link headers that include alternate RDF, Turtle, and JSON-LD serializations for that resource, including a JSON-LD serialization following the Linked Art profile. A capable JSON-LD parser can convert this profile into other serializations of RDF (XML, Turtle, etc.) according to the CIDOC-CRM ontology that other semantic web developers might more readily recognize.

It is therefore possible to request the Linked Art JSON-LD via content negotiation from xEAC, for example:

curl -H "Accept: application/ld+json;profile=\"https://linked.art/ns/v1/linked-art.json\"" 
     http://numismatics.org/authority/adams_edgar

The Accept header is then parsed by the XProc pipeline in a controller that reads the content-type and profile in order to choose which serialization to enact. In this case, the EAC-CPF document is transformed via an XSLT stylesheet into an intermediate XML document that represents a JSON structure of objects and arrays, which is subsequently transformed by a secondary XSLT stylesheet into a text output, to which the XProc pipeline attaches an `application/ld+json` content-type in the HTTP header. This JSON metamodel approach has been applied throughout many of my frameworks, including Nomisma.org and Numishare, in order to consistently transform various XML schemas into different JSON profiles, from Linked Art to GeoJSON to the model required by d3js for data visualization.

The mapping of EAC-CPF for people, corporate bodies, and families follows the specifications for people and organizations drafted by the Linked Art community, at https://linked.art/model/actor/. A representation of a person in the ANS archival authority system, Edgar H. Adams, includes the preferred name and biographical statement (eac:biogHist/eac:abstract), URIs for matching concepts (a xEAC-specific implementation of eac:identity/eac:entityId[@localType = 'skos:exactMatch']), birth/death (for people) and formed_by/dissolved_by (for families and corporate bodies) dates from eac:existDates, and member/member_of links to URIs that implement relevant W3C Org ontology properties to the @xlink:arcrole in an eac:cpfRelation (also a specific xEAC implementation to align EAC-CPF more directly with Linked Open Data principles).

{
  "@context": "https://linked.art/ns/v1/linked-art.json",
  "id": "http://numismatics.org/authority/adams_edgar",
  "type": "Person",
  "_label": "Adams, Edgar H. (Edgar Holmes), 1868-1940",
  "identified_by": [
    {
      "type": "Name",
      "content": "Adams, Edgar H. (Edgar Holmes), 1868-1940",
      "classified_as": [
        {
          "id": "http://vocab.getty.edu/aat/300404670",
          "type": "Type",
          "_label": "Primary Name"
        }
      ]
    }
  ],
  "exact_match": [
    "http://viaf.org/viaf/92956241",
    "http://d-nb.info/gnd/101883196",
    "http://dbpedia.org/resource/Edgar_Adams",
    "http://www.wikidata.org/entity/Q3719031",
    "http://id.loc.gov/authorities/names/n81061401",
    "http://n2t.net/ark:/99166/w6n03w0m"
  ],
  "born": {
    "type": "Birth",
    "_label": "Start Date",
    "timespan": {
      "type": "TimeSpan",
      "begin_of_the_begin": "1868-04-07",
      "end_of_the_end": "1868-04-07"
    }
  },
  "died": {
    "type": "Death",
    "_label": "End Date",
    "timespan": {
      "type": "TimeSpan",
      "begin_of_the_begin": "1940-05-05",
      "end_of_the_end": "1940-05-05"
    }
  },
  "referred_to_by": [
    {
      "type": "LinguisticObject",
      "content": "Edgar H. Adams (1868-1940) of Bayville, Oyster Bay, and Brooklyn, 
      New York, was a numismatic scholar, author, and collector who produced, among 
      other works, reference guides to territorial and private gold coins. He also 
      coauthored, with William H. Woodin, the book United States Pattern, Trial, 
      and Experimental Pieces, a standard reference work on pattern coins. He served 
      as editor of The Numismatist, the monthly journal of the American Numismatic 
      Association, wrote a numismatic column for the New York Sun newspaper, and 
      was a co-founder of the New York Numismatic Club (1908).",
      "classified_as": [
        {
          "type": "Type",
          "id": "http://vocab.getty.edu/aat/300435422",
          "_label": "Biography Statement",
          "classified_as": [
            {
              "id": "http://vocab.getty.edu/aat/300418049",
              "type": "Type",
              "_label": "Brief Text"
            }
          ]
        }
      ]
    }
  ],
  "member_of": [
    {
      "type": "Group",
      "id": "http://numismatics.org/authority/new_york_numismatic_club",
      "_label": "New York Numismatic Club"
    },
    {
      "type": "Group",
      "id": "http://viaf.org/viaf/157729460",
      "_label": "American Numismatic Association"
    }
  ]
}

Ideally, we would want to be able to include links to geographic resources for places of birth or death, occupations, and other events as machine-readable data, with actionable xs: dates and references to controlled vocabulary URIs. Some of this is already possible within xEAC because it was built from the ground up to interact with LOD resources, but projects like Social Networks and Archival Context (SNAC) aren't yet well-integrated with external resources.

Wednesday, March 4, 2020

270 hoard documents and 60 authorites added to the ANS Archives

In a major digital archival publication today, 270 documents pertaining to Greek coin hoards have been added into the ANS Digital Archives, Archer, and 60 new archival authorities have been added into the ANS Biographies (EAC-CPF records published in xEAC). These authorities include numerous prominent numismatists, archaeologists, dealers, and collectors, as well as some individuals who are not prominent--people only attested through our archives and a scant provenance records from other museums. Each of these authorities will be created or updated in the Social Networks and Archival Context (SNAC) project, along with links back to our archival records.

A nice example is Sir Arthur Evans, the famous archaeologist of Knossos. He is mentioned in several letters between Sidney Noe and other scholars. Although Evans is not a prominent scholar in our own archives, his papers are held in other institutions. We are able to make our few letters more broadly available to researchers interested in Arthur Evans through SNAC.

The record for Arthur Evans, with links to hoard documents.

The archival documents themselves represent the first portion of a larger collection of scanned letters, invoices, inventories, notes, hoard photographs, and other research materials related to The Inventory of Greek Coin Hoards and subsequent Coin Hoards volumes. Coin Hoards will be published online in the near future, after we migrate the old IGCH platform into a completely new database system that operated more like Coin Hoards of the Roman Republic.

The display of IGCH 140, with new archival documents

Under the hood, these archival records are TEI documents generated from spreadsheet metadata entered by Peter van Alfen. The images are IIIF-compliant and follow the procedures we have already established with Edward T. Newell's research notebooks. The Archer framework, EADitor, was updated to accommodate other types of archival materials represented as TEI (manuscripts, etc.), and EADitor is capable of serializing these files directly into RDF for Archer's SPARQL endpoint (that drives the interconnectivity between the authority records and archival items, as well as the display of archival items in MANTIS and IGCH). Additionally, the TEI files, and TEI-encoded annotations, are serialized dynamically into IIIF manifests.

Because all TEI files use the same annotation system in the back-end of EADitor (Masahide Kanzaki's Image Annotator: https://www.kanzaki.com/works/2016/pub/image-annotator), these new archival documents can be annotated with URIs from Nomisma.org, coins in our collection, coin types or monograms in PELLA or other corpora. As a proof of concept, I annotated the names of Mithradates VI and Lysimachus with their respective Nomisma URIs on the notes of Wayte Raymond about IGCH 973: http://numismatics.org/archives/ark:/53695/igch973.001. These annotations, stored natively in TEI surface elements within a facsimile, are serialized into JSON-LD according to the IIIF spec in real time, and displayed at the link above in Mirador. The names are also listed in the index below the Mirador viewer.

While we still have more metadata to enter for more archival documents, the data-entry workflow and processing scripts are fully established at this stage. This is the next step in transforming the IGCH database into a more comprehensive research platform for Greek coin hoards.

Tuesday, July 9, 2019

135 ANS authority records merged into SNAC

Finally, after fine-tuning the xEAC-to-SNAC publication workflow over the last few months after initially building this functionality into xEAC last summer, I have switched over to the SNAC production API. We have integrated authority data from 135 EAC-CPF records in the American Numismatic Society Biographies into the Social Networks and Archival Context project. Among these authority records are dozens of new ones inserted into SNAC, complete with biographical information and references to digital archival and library holdings at the ANS. One of the more notable additions to SNAC is Margaret Thompson, one of the most prominent Greek numismatists of the latter 20th century and a long-time curator at the ANS.

Not only have we provided a comprehensive biography of Margaret Thompson, but also URIs in other systems, such as VIAF and Wikidata. The Bibliographic Resources for Thompson include numerous archival photographs (which link back to the ANS Archives--many of these are available in IIIF) and four ebooks in our Open Access Digital Library. These ebooks were digitized as part of the NEH-Mellon Foundation Open Humanities Book program.

SNAC record for Edward T. Newell, with biography from the ANS.

In fact, since many of the ~200 books digitized as part of this NEH-Mellon project were authored by prominent numismatists represented in the ANS archival authorities, 74 of these books have been made accessible to scholars through SNAC. This was the aim of our initial application to this grant program--finally realized by much work in extending xEAC to be able to interact with SNAC's JSON APIs. We not only wanted to create a large corpus of TEI ebooks that linked to URIs in our numismatic collection or research databases like Online Coins of the Roman Empire and the Inventory of Greek Coin Hoards (and similar systems), but to integrate these books into the larger cloud of cultural heritage data by linking the authors to large-scale authority systems like SNAC that could be leveraged to point researchers back to our own services.

SNAC was funded not only by Mellon (like our ebooks project), but also initially by the IMLS and the NEH. In this way, we are providing value to funders by building upon projects in which they have already invested: creating a whole that is greater than the sum of its parts. I hope that other institutions will look at xEAC and our broader archival LOD strategy (see Linked Open Data and Hellenistic Numismatics and Linked Open Data for Numismatic Library, Archive, and Museum Integration for further information about this architecture) as a means by which they too can enhance SNAC while simultaneously broadening access to their own materials.

By incorporating our archival authorities and digital archives and library into SNAC, we are providing pathways through broader, more generalized aggregators for non-numismatic researchers who may otherwise never think to query our archives directly. A great example of this is the record for the prominent sculptor, Augustus Saint-Gaudens. This record links to more than 160 finding aids published by dozens of institutions, including museum archives, and so art historians may find correspondences in our archives as well as the Smithsonian Archives of American Art or the New York Public Library. Furthermore, since we have already used the Wikidata API look-up inherent to xEAC to embed related authority URIs in our own EAC-CPF record, we inserted the Getty ULAN URI for Saint-Gaudens into SNAC. This would, in theory, make it possible for SNAC to interact with art historical aggregators built on the Getty vocabularies to extract other works of cultural heritage, such as medals held at the American Numismatic Society or sculptures held in other art museums both in the United States and abroad.

I think we are only seeing the tip of the iceberg of what will be possible interacting with SNAC.

Thursday, January 10, 2019

Updates to IIIF image annotation in the EADitor back-end

The American Numismatic Society's archival images were migrated into IIIF in the fall of 2017, including the extension of EADitor to faciliate the creation of manifests from TEI files that represent the Newell notebooks. While the front end was updated to use Leaflet for single photographs (MODS records) or Mirador for image collections, like the notebooks or the Agnes Balwin Brett papers, the back-end had not been updated to enable the editing or creation of new annotations.

After the back-to-back releases of the full Seleucid Coins Online and the first phase of Ptolemaic Coins Online in December, I have been able to pivot completely from coin type corpora and data cleaning to working on our digital archives for a brief period. After fixing some bugs, I turned my attention to piecing the image annotation back together in the XForms engine for TEI editing/publication within Archer. The original system was developed in 2014. This blog post covers most of the technical underpinnings, but to summarize: Rainer Simon's Annotorious was hooked into OpenLayers to facilitate image annotation. The create/remove/update handlers in Annotorious were used to round trip the annotations to/from TEI surface elements within tei:facsimiles and Annotorious' JSON model in the XForms engine (using the client-side Javascript hooks in Orbeon). There have been significant updates to Orbeon since 2014, and my original code was somewhat broken, and therefore I needed to explore alternative solutions.

My first attempt was loading a manifest for a Newell notebook into Mirador in the XForms web form. Although Mirador did load the manifest, due to of some unforeseen conflicts between the Javascript in Orbeon and Mirador, the annotation popups (with the TinyMCE library) didn't function correctly. I then began to explore Masahide Kanzaki's Image Annotator. This was appealing, as I had tested this application's ability to show two images on the same canvas in dynamically SPARQL-generated IIIF manifests from Numishare-based type corpora (see this example of RRC 15/1a that combines IIIF images from three different museums into one manifest--one canvas per coin and two images per canvas). The Image Annotator not only loads IIIF manifests into OpenSeaDragon, but was extended to support Annotorious for creating and viewing annotations.

After several days of work, I have been able to fully reactivate image annotation in the EADitor back-end with the Image Annotator. It took a little bit of reverse engineering in order to find the functions for the handlers, with some slight modifications to my original code to hook the Annotorious handlers into the XForms engine. This included some changes in the mathematical calculations for converting the ratio-based coordinates to pixels for the TEI surface's upper-left x,y and lower-right x,y attributions. These TEI attributes are serialized into proper #xywh fragments in the Web Annotations in the manifest.

Fig. 1: Image Annotator in the XForms engine

I also had to track down and comment out some components of the UI (like the document metadata and links) and tweak the CSS so that the OpenSeaDragon window fit within the parameters of my existing Bootstrap 3.x template.

URIs in certain namespaces are still parsed to extract human-readable labels (see Fig. 1 and 2), for example, from the ANS collection. My intention is to extend the range of parseable URIs to include Wikidata, other URIs in the ANS digital library or archives, Social Networks and Archival Context, Worldcat Works, and, eventually, URIs for Hellenistic monograms. I might even extend the parsing to extract thumbnail images for coins and store those in the tei:desc within the TEI document (in addition to simple mixed content w/ tei:ref elements as external links).

Fig. 2: After clicking 'Save', the URI is replaced with an HTML link

After the reworking of the IGCH data over the next several months, we will turn our attention to annotating more of Edward T. Newell's notebooks as part of the NEH-funded Hellenistic Royal Coinages (HRC) project. The UI provided by the Image Annotator is much easier to work with than the one I had developed more directly within XForms nearly five years ago, and so we should see some significant progress toward annotation these notebooks to link to coins in our (or other) numismatic collections, coin types in HRC, Greek coin hoards, and our yet-to-be-published database of Greek monograms. And these annotations will enhance research context in our other platforms by pointing users back to individual notebook pages in Archer from Mantis or IGCH (for example, from http://coinhoards.org/id/igch1664 or http://numismatics.org/collection/1944.100.26870).

SPARQL-generated list of Open Annotations related to IGCH 1664

Thursday, November 15, 2018

An American Europeana

The blog is often reserved for updates or technical explanations of archival/authority software development at the American Numismatic Society, or experimentation in new modes of archival data publication (mainly Linked Open Data).

However, since I have long been a proponent of open, community-oriented efforts to publish cultural heritage aggregations, like Europeana and DPLA, I wanted to take a bit of time to hash out some thoughts in the form of a blog post instead of starting a series of disjointed Twitter threads [1, 2].

Most of you have likely heard that DPLA laid off six employees, and John S. Bracken went online to speak of his vision and answer some questions. This vision seems to revolve around ebook deals primarily, with cultural heritage aggregation as a secondary function of DPLA. However, DPLA laid off the people that actually know how to do that stuff, so the aggregation aspect of the organization (which is its real and lasting value to the American people) no longer seems viable.

I believe the ultimate solution for an American version of Europeana is tying it into the institutional function of a federally-funded organization like the Library of Congress or Smithsonian, with the backing of Congressional support for the benefit of the American people (which is years away, at least). However, I do think there are some shorter-term solutions that can be undertaken to bootstrap an aggregation system and administered by one organization or a small body of institutions working collaboratively. There doesn't need to be a non-profit organization in the middle to manage this system, at least at this phase.

There are a few things to point out regarding the system's political and technical organization:

The real heavy lifting is done by the service/content hubs. It takes more time/money/professional expertise to harvest and normalize the data than it does to build the UI on top of good quality data.
Much of the aggregation software has been written already, but hasn't been shared broadly with the community.
There seems to be a wide variation in the granularity and quality of data provided to DPLA. I wrote a harvester for Orbis Cascade that provided them with DPLA Metadata Application Profile-compliant RDF that had some normalization of strings extracted from Dublin Core to Getty AAT and VIAF URIs, which were modeled properly into SKOS Concepts or EDM Agents. But DPLA couldn't actually ingest their own data model.
Europeana has already written a ton of tools that can be repurposed.
There are other off the shelf tools that scale that could be appropriated for either the UI or underlying architecture (Blacklight, various open source triplestores, like Apache Fuseki, which I have heard will scale at least to a billion triples).
On a non-technical level, the name "Digital Public Library of America" itself is problematic, because the project has been overwhelmingly driven by R1 research libraries. Cultural Heritage is more than what you find in a Special Collections Library, and museums are notably absent from this picture (in contrast to Europeana).

Without knowing more of the details, I had heard that DPLA had scaling issues with their SPARQL endpoint software. I don't know if this is still an issue with this particular software, but I do believe the data were a problem. Aside from what was produced by those organizations that are part of Orbis Cascade that opted to reconcile their strings to things (sadly, most did not choose to take this additional step), how much data ingested by DPLA is actual, honest to God Linked Open Data--with, you know, links? A giant triplestore that's nothing but literals is not very useful, and it's impossible to build UIs for the public that can live up to the potential of the data and the architectural principles of LOD.

At some point, there needs to be a minimum data quality barrier to entry into DPLA, and part of this is implementing a required layer of reconciliation of entities to authoritative URIs. I understand this does create more work for individual organizations that wish to participate, but the payoffs are immense:

Reconciliation is a two way street: it enables you to extract data from external sources to enhance your own public-facing user interface (biographies about people--that sort of thing).
Social Networks and Archival Context should play a vital role in the reconciliation of people, families, and corporate bodies. There should be greater emphasis in the LibTech community to interoperate with SNAC in order to create entities that only exist in local authority files, which will then enable all CPF entities to be normalized to SNAC URIs upon DPLA ingestion.

Furthermore, SNAC itself can interact with DPLA APIs in order to populate a more complete listing of cultural heritage objects related to that entity. Therefore, there is an immediate benefit to contributors to DPLA, as their content will simultaneously become available in SNAC to a wide range of researchers and genealogists via LOD methodologies.
SNAC is beginning to aggregate content about entities, so it frankly doesn't make sense for there to be two architecturally dissimilar systems that have the same function. DPLA and SNAC should be brought closer together. They need each other in order for both projects to maximize their potential. I strongly believe these projects are inseparable.

With regard to the first two points, content hubs should put greater emphasis on building the reconciliation services for non-technical libraries, archivists, curators, etc. to use, with intuitive user interfaces that allow for efficient clean-up. Many people (including myself) have already built systems that look up entities in Geonames, VIAF, SNAC, the Getty AAT/ULAN, Wikidata, etc. This work doesn't need to be done from scratch.

Because DPLA's data are so simple and unrefined, many of the lowest hanging fruits in a digital collection interfaces have not been achieved, such as basic geographic visualization. Furthermore facet fields are basically useless because there's no controlled vocabulary.

After expanding the location facet for a basic text search of Austin, I am seeing lists that appear to be Library of Congress-formatted geographic subject headings. The most common heading is "United States - Texas - Travis County - Austin", mainly from the Austin History Center, Austin Public Library. However, there are many more variations of the place name contributed by other organizations.

The many Austins

This is really a problem that needs to be addressed further down the chain from DPLA at the hub level. If you want to build a national aggregation system that reaches its full potential, more emphasis needs to be placed on data normalization.

DPLA decided to go large scale, low quality. I am much more of a small scale, good quality person, because it is easier to scale up later once you have the workflows to produce good quality data than it is to go back and clean up a pile of poor data. And I don't think that the current form of the DPLA interface is powerful enough to demonstrate the value of entity reconciliation to the librarians, curators, etc. making the most substantial investment of time. You can't get the buy-in from that specialist community without demonstrating a powerful user interface that capitalizes on the effort they have made. I know this from experience. Nomisma.org struggled to get buy-in until we built Online Coins of the Roman Empire, and now Nomisma is considered one of the most successful LOD projects out there.

My recommendation is to go back to the drawing board with a small number of data contributors to develop the workflows that are necessary to build a better aggregation system. This process should be completely transparent and can be replicated within the other content hubs. The burden of cleaning data shouldn't fall on the shoulders of DPLA (or whoever comes next).

There are obvious funding issues here, but contributions of staff time and expertise can be more valuable than monetary contributions in this case.

Wednesday, July 11, 2018

Creating and Updating SNAC constellations directly in xEAC

After 2-3 weeks of work, I have made some very significant updates to xEAC, one which paves the way to making archival materials at the American Numismatic Society (and other potential users of our open source software frameworks) broadly accessible to other researchers. This is especially important for us, since we are a small archive with unique materials that don't reach a general historical audience, and we are now able to fulfill one of the potentialities we outlined in our Mellon-NEH Open Humanities Book project: that we would be able to make 200+ open ebooks available through Social Networks and Archival Context (SNAC).

I have introduced a new feature that interacts with the SNAC JSON API within the XForms backend of xEAC (note that you need to use an XForms 2.0 compliant processor for xEAC in order to make use of JSON data). The feature will create a new constellation if none exists or supplement existing constellations with data from the local EAC-CPF record. While the full range of EAC-CPF components is supported by the SNAC API, I have focused primarily on the integration of the stable URI for the entity in the local authority system (e.g., http://numismatics.org/authority/newell), existDates (if they are not already in the constellation), and the biogHist. Importantly, if xEAC users have opted to connect to a SPARQL endpoint that also contains archival or libraries materials, these related resources will be created in SNAC and linked to the constellation.

It should be noted that this system is still in beta and has only been tested with the SNAC development server. There is still work to do with improving the authentication handshake between xEAC and SNAC.

The process

Step 1: Reviewing an existing constellation for content

The first step of the process is executed when the user loads the form. If the EAC-CPF record already contains an entityId that conforms to the permanent, stable SNAC ARK URI, a "read" query will be issued to the SNAC API in order to determine what content already exists in the constellation, including what resources are already available in the constellation vs. the resources extracted from the local archival information system via SPARQL.

The SPARQL query for extracted resources from the endpoint is as follows:

PREFIX rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dcterms:  <http://purl.org/dc/terms/>
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>

SELECT ?uri ?role ?title ?type ?genre ?abstract ?extent WHERE {
?uri ?role <http://numismatics.org/authority/newell> ;

     dcterms:title ?title ;
     rdf:type ?type ;
     dcterms:type ?genre .
  OPTIONAL {?uri dcterms:abstract ?abstract}
  OPTIONAL {?uri dcterms:extent ?extent}
} ORDER BY ASC(?role)

I recently made an update to our Digital Library and Archival software so that every different type of resource (ebooks and notebooks in TEI, photographs in MODS, finding aids in EAD) will include a dcterms:type linking to a Getty AAT URI in the RDF serialization. This AAT URI, in conjunction with the rdf:type of the archival or library object (often a schema.org Class), will help determine the type of resource according to SNAC's own parameters (BibliographicResource, ArchivalResource, DigitalArchivalRescource). Additionally, the role of the entity with respect to the resource (dcterms:creator, dcterms:subject) informs the role within the SNAC resource-constellation connection: creatorOf, referencedIn. Abstracts and extents are inserted, if available.

Step 2: Validate authentication

SNAC uses Google user tokens for validation within its own system. There is currently no handshake available between xEAC and SNAC which will facilitate multiple users in xEAC to each have their own credentials in SNAC. At the moment, the "user" information is stored in the xEAC config file. A user will have to enter their Google credentials from the SNAC API Key page into the web form and click the "Confirm User Data" button. xEAC will submit an "edit" to a random constellation to verify the validity of the authentication information. If it is successful, the credentials are then stored back into the config (although the token only lasts about 24 hours) and the constellation is immediately unlocked. The user will then proceed to the create/update constellation interface.


Authenticating through xEAC

Step 3: Creating or updating a constellation

The user will now see several checkboxes to add information into the constellation. Eventually, it will be possible to remove data as well. Below is a synopsis of options:

Same As URI: The URI of the entity in the local authority system will be added into the constellation. This is especially important or establishing concordances between different vocabulary systems.
Exist dates can be added into the constellation if they are not already present.
If there isn't already a biogHist in the constellation and there is one present in the EAC-CPF record, the biogHist will be escaped and published to SNAC. A source will also be created in the constellation in order to link the new biogHist to SNAC control metadata, tying the new biogHist directly to the local URI for the authority. This makes it possible to update or delete only the biogHist associated with your own entity without overwriting other biogHist information that might already be present within the constellation. While SNAC does support multiple biogHists, only the most recently added biogHist will appear in the HTML view of the entity. For this reason (at present), xEAC will only insert a biogHist if there isn't one in the constellation already. In step 1, if the constellation already contains a biogHist associated with the source URI for your authority, it will hash encode the constellation's biogHist and compare it to the hash-encoded biogHist currently in the EAC-CPF record. If there is a difference between these hashes, the constellation will be updated with the current version of the biogHist in the EAC-CPF record.
A list of resource relations derived from SPARQL will be displayed. All will be checked by default in order to first create the resource with the "insert_resource" API command, and second to connect the constellation to that newly created resource with "update_constellation". Each resource entry will display some basic metadata and whether or not it already exists in the constellation, and what action will be taken. It is possible to uncheck the box for a resource that exists in the constellation to remove it from the constellation.


The interface for creating and updating SNAC constellations

Step 4: Saving the ARK back to the EAC-CPF record, if applicable

After the successful issuing of "publish_constellation" to the SNAC API, an entityId with the new SNAC ARK URI will be inserted into the EAC-CPF record, if the constellation is newly created (updates presume the ARK already exists in the EAC record). Saving the EAC record will trigger a re-indexing of the document to Solr and a SPARQL/Update that will insert the ARK as a skos:exactMatch into the concept object for the entity.


PREFIX rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>

INSERT { ?concept skos:exactMatch <ARK>  }
WHERE { ?concept foaf:focus <URI> }

The data above are those I consider to most vital to SNAC integration--essential historical or biographical context and related archival or library resources that can be made more broadly accessible. I am not sure how many other authority systems are able to interact with SNAC with this degree of granularity yet, but I am hopeful that these features will propel more unique research materials into the public sphere.

I will briefly touch on these new features when I present our our comprehensive LOD-oriented numismatic research platform at SAA next month (I will upload the slideshow soon).

Thursday, June 7, 2018

SNAC Lookups Updated in xEAC and EADitor

Since the Social Networks and Archival Context has migrated to a new platform, it has published a JSON-based REST API, which they have well-documented. Although EADitor and xEAC have had lookup mechanisms to link personal, corporate, and family entities from SNAC to EAD and EAC-CPF records since 2014 (see here), the lookup mechanisms in the XForms-based backends to these platforms interacted with an unpublicized web service that provided an XML response for simple queries.

With the advent of these new SNAC APIs and JSON processing within the XForms 2.0 spec (present in Orbeon since 2016), I have finally gotten around to overhauling the lookups in both EADitor and xEAC. Following documentation for the Search API, the XForms Submission process now submits (via PUT) an instance that conforms to the required JSON model. The @serialization attribute is set to "application/json" in the submission, and the JSON response from SNAC is serialized back into XML following the XForms 2.0 specification. Side note: the JSON->XML serialization differs between XForms 2.0 and XSLT/XPath 3.0, and so there should be more communication between these groups to standardize JSON->XML across all XML technologies.

The following XML instance is transformed into API-compliant JSON upon submission.

<xforms:instance id="query-json" exclude-result-prefixed="#all">
 <json type="object" xmlns="">
  <command>search</command>
  <term/>
  <entity_type/>
  <start>0</start>
  <count>10</count>
 </json>
</xforms:instance>

The submission is as follows:

<xforms:submission id="query-snac" ref="instance('query-json')" 
    action="http://api.snaccooperative.org" method="put" replace="instance" 
    instance="snac-response" serialization="application/json">
 <xforms:header>
  <xforms:name>User-Agent</xforms:name>
  <xforms:value>XForms/xEAC</xforms:value>
 </xforms:header>
 <xforms:message ev:event="xforms-submit-error" level="modal">Error transfroming 
into JSON and/or interacting with the SNAC
  API.</xforms:message>
</xforms:submission>

The SNAC URIs are placed into the entityIds within the cpfDescription/identity in EAC-CPF or as the @authfilenumber for a persname, corpname, or famname in EAD.

The next task to to build APIs into xEAC for pushing data (biographical data, skos:exactMatch URIs, and related archival resources) directly into SNAC. By tomorrow, all (or nearly all) of the authorities in the ANS Archives will be linked to SNAC URIs.

Pages