Thursday, March 17, 2016

First EBook published as part of Mellon/NEH Humanities Open Book Project

This is a follow-up to some major feature additions in MANTIS and IGCH detailed on the Numishare blog.

Today, we have published our first out of print, open access EBook for the NEH/Mellon Foundation Humanities Open Book Program. It is Sydney Noe's 1920 Coin Hoards, the first issue of Numismatic Notes and Monographs. As we discussed in our grant application, we had a vendor transcribe these PDFs of images we received from HathiTrust into TEI. The TEI is run through a normalization XSLT stylesheet to correct some issues and pull bibliographic metadata from various sources, and then value-added tagging is applied to link to coins in our collection, hoards on coinhoards.org, and entities in various geographic gazetteers or linked open data vocabulary systems.

As a result, we not only have a digital text that you can view in your browser as HTML5 or download as an EPUB 3.0.1, but a richly-tagged document that is exposed as RDF conforming to Open Annotation, which is then published into our archival SPARQL endpoint (and soon published into Pelagios). Many of the technical features of this publication process have already been discussed in this blog or in the post linked above.

This framework is part of a broader effort to integrate all of our Library, Archive, and Museum holdings into a central hub for numismatic research. It is therefore possible to gain further insight about the people, places, and things mentioned in these digital publications through Linked Open Data methodologies, but also to provide greater context to our data-driven numismatic research projects like IGCH, OCRE, etc.

Not only do we have a rich set of interlinked numismatic projects focusing on hoards, coins, and coin types, but now between these things and numismatic monographs and journals, archival research notebooks, finding aids, and authority records. Not only is it possible to read biographical information about Sydney Noe in Archer, you can view a map and timeline of his life, his social network graph, and gain access to a list of materials written by or about him.

This is the topic of my CAA presentation in Oslo in a few weeks.

Friday, March 11, 2016

Toward a more thoroughly integrated numismatic research system

I am making updates to our systems in preparation for the initial publication of NEH/Mellon EBooks. Part of the project is to thoroughly integrate these EBooks with our collection, archives, IGCH, and related project databases. I still have some work to do, but should have the first EBooks ready next week.
I updated the RDF model for our digitized Newell notebooks to conform to the model for our EBooks (Open Annotation) (there is one book published so far, the ANS Medals book by Miller). What this means is that mentions of IGCH, other scholars represented in our biographies site, and [soon] individual coins in Newell's notebooks will be made available through those other interfaces.

See http://coinhoards.org/id/igch1399

  • You can click on individual pages where Newell notes IGCH1399, and the page will load in Archer.
  • You can see a list of coin types from this hoard, and you can download the list of coin types or a full list of coins from the hoard (note that we aren't publishing our Greek coins that aren't connected to coin type URIs in nomisma.org's SPARQL endpoint).

On http://numismatics.org/authority/id/newell (an EAC-CPF authority record)

These already functioned --
  • See a list of archival materials about Edward Newell
  • (Fairly new) Several annotations in Miller's Medallic Arts of the ANS where he mentions Newell. You can click a link to go directly to a section.
  • A social network graph showing Newell and his relations (also driven by SPARQL, detailed here).

On http://numismatics.org/authority/id/noe
  • As before, you can get a list of archival materials about Noe
  • Newell mentions Noe on two pages of a notebook

Next steps:
  1. Update the code for Mantis to display annotations about specific coins referenced in Newell's notebooks or our EBooks.
  2. Update the Pelagios exports for the Digital Library and Archer to make our EBooks and archival materials more broadly accessible to the ancient world community
  3. Build widgets into our Digital Library to pull data from our other systems

This interlinking will be inherent to the publication mechanism for our EBooks. When we publish the first several next week, the annotations will be available in Mantis, the Archer Biographies, IGCH, etc.

I will be discussing these things and more in my presentation at CAA in Oslo at the end of the month.

Thursday, January 28, 2016

SPARQL-based social network graph in xEAC

I pushed into production a new SPARQL-based social network graph feature in xEAC. The most interesting places to start are http://numismatics.org/authority/newell and http://numismatics.org/authority/new_york_numismatic_club, but we have a lot of work to do to enhance the linkage between our authorities in order to make these visualizations more useful in the future.

Nearly a year ago, I began implementing a new EAC-CPF to RDF data model that could represent a graph of relationships in order to begin experimenting with rendering a social network graph in real time. After investigating the open source Javascript graph visualization tools, I choose vis.js, as it was powerful, easy to use, and could load JSON on the fly. I got a very basic graph working a year ago in time for Moving People, Linking Lives at the University of Virginia, but it wasn't interactive, in that you could not expand beyond the first level of nodes connected to the authority record you were immediately viewing.

After launching our first EBook a few weeks ago in ETDPub (which is integrated with our production installation of xEAC), I decided to revisit xEAC development of the social network graph interface.

The Model

The RDF model implements bits and pieces of various standard ontologies. People, corporate bodies, and families have separate URIs for their entity represented as a Concept and as a Thing. The Concept (skos:Concept) of a person can be linked to concepts of that person in other vocabulary systems, like the Getty ULAN, VIAF, Wikidata, or SNAC. This is also the data object where you may also include provenance about the creation and modification of the object record. For example, dcterms:created applied for a foaf:Person would imply that the person was born on the given date, but when used in a skos:Concept, this implies that the concept data object would have been created in the data system at the given date.

The Concept object is connected to the Thing object with foaf:focus.

The Thing object contains mainly biographical information, using the bio ontology. While much work remains to be done to link individuals to events, basic birth and death dates are represented, as well as a string of bio:relationships. Each bio:Relationship object contains a property defining the nature of the relationship and the target entity of the relationship. I will probably revisit the properties by which people are linked to organizations (using the org ontology more properly), but the model does function well enough to generate a graph of relationships.

SPARQL to Vis.js JSON

Vis.js renders two JSON models, one for nodes and the other for edges, into a visual graph following HTML5 standards. Essentially, I had to build two web services in xEAC that would deliver these JSON models that could be read in real time via Ajax. The underlying model for these services is the SPARQL query, and the views are generated with two different XSLT stylesheets to generate the JSON that vis.js requires to render the graph. The query is this:


SELECT ?sourceName ?type ?target ?name ?class WHERE {
 <URI> foaf:name ?sourceName ;
       bio:relationship ?rel .
  ?rel xeac:relationshipType ?type ;
       bio:participant ?target .
  ?target foaf:name ?name ;
          a ?class
       
}

Essentially, we get all of the relationships connected to a particular entity (URI), the type of relationship (e.g., rel:spouseOf), and the target entity, whether another URI in the system or a blank node. The SPARQL response is processed and serialized into JSON. When clicking on connecting nodes in the graph visualization--if the target node is not a blank node RDF object (therefore, another authority in xEAC)--vis.js fires off another Ajax call to create new nodes and edges. Arrows in the graph visualization indicate the directionality of the relationship.

I should say that this is just the first phase of social network graph visualization in xEAC. While I have focused mainly on visualizing relationships on the level of the individual authority, my goal is to expand the application to implement a more sophisticated query interface that allows users to select arbitrary parameters to generate their own visualizations. For example, a user may want to view all persons grouped together by family or corporate body. Or group people by occupation or filter by date or place. All of these things are possible by reconceptualizing EAC-CPF into RDF graphs and developing the SPARQL queries that can be rendered into JSON for vis.js.

Wednesday, January 13, 2016

Survey to help usability testing

I have created a short questionnaire in a Google form to aid in usability testing for our TEI -> EPUB serialization. It is available at https://docs.google.com/forms/d/10Prvpm5eDvjNZaeqgXZ7luLeSkVrOgZ3hJX5zjFBuSg/viewform

You can download the EPUB file for the Miller Medallic Arts of the American Numismatic Society book here.

First EBook published to ANS Digital Library

This afternoon, we have published our first EBook to the ANS Digital Library in the ETDPub framework. This EBook, Medallic Art of the American Numismatic Society, 1865–2014 by Scott Miller, is encoded in TEI and has been issued with a Creative Commons BY-NC license. While the TEI file has not been fully linked into name and place authority files, I was able to use regex to link to medals in the American Numismatic Society collection and link one prominent scholar, Edward T. Newell, to our archival authority record. The TEI file will be fully linked up later, but the publication of this EBook can be seen as a completely functional demonstration of the technical application of linked open data principles to publishing these types of books for the larger NEH-Mellon Humanities Open Book project.

The TEI is indexed into Solr by ETDPub for full-text search, but this is only the beginning of this system's features. Using Orbeon's XPL pipelines, we are able to cobble together a series of XSLT transformations of the TEI file into XHTML and other XML files (NCX, OPF) required by the EPUB 3.0.1 specification. There is a link to the EPUB download on the page for the EBook, and the EPUB file is generated dynamically. I have tested on an ereader application on my desktop (Ubuntu) and a few on my Android phone. They mostly seem to work well, but the table of contents isn't consistently functional, but this seems to be more of an issue of the individual app not supporting EPUB 3.0.1 correctly rather than the EPUB file itself. I plan to put together a survey to assist in usability testing.

It is also important to note that the focus with EPUB serialization so far has been almost solely on functionality. The XSLT stylesheets are very basic and I have applied almost no CSS styling, but there is potential in enhancing the overall aesthetic of the document. That will come later as functional issues are ironed out. I am aware the tables do not seem to render properly.

The other major feature of ETDPub's TEI publishing is serialization into RDF (so far, XML, but JSON-LD and Turtle outputs are coming). This RDF is fairly rudimentary so far. The RDF contains a data object for the book as a whole (and associated metadata in dcterms, like creator and publisher) and for each child div, using dcterms:isPartOf to link the hierarchical structure of the book together. Furthermore, any link (ref element) within the lowest level relevant div and any name that has been linked via the @corresp attribute to an authoritative URI in the teiHeader will be rendered as an annotation following the Open Annotation model. ETPub is capable of executing CRUD operations with an SPARQL 1.1-compliant endpoint, and so the Digital Library application is posting into the triplestore that links our archival objects and authority records together. I have previously discussed linking xEAC and EADitor together via SPARQL, and now ETPub is capable of doing the same. The authority record for Newell now includes links to the sections in the ANS medals book in which he was mentioned, in addition to the research notebooks and photographs associated with Newell that are contained in the ANS archives. We are moving forward with linking our library, archives, and collection more closely together internally, as well as paving the way for scholars to gain further context with data sources outside the ANS.

Thursday, December 17, 2015

ANS Awarded Funding for NEH/Mellon Foundation’s Humanities Open Book Project

The American Numismatic Society has been chosen as one of ten publishers to participate in Humanities Open Book project, a joint NEH-Mellon Foundation grant program to convert out-of-print books of enduring scholarship into EPUB e-books licensed to allow readers to search and download these books freely, and to read them on any type of e-reader. The ANS is the only learned society to receive funding for this initiative.

“The large number of valuable scholarly books in the humanities that have fallen out of print in recent decades represents a huge untapped resource,” said NEH Chairman William Adams. “By placing these works into the hands of the public we hope that the Humanities Open Book program will widen access to the important ideas and information they contain and inspire readers, teachers and students to use these books in exciting new ways.”

ANS publications date back to 1866 and include over 500 volumes of numismatic scholarship. Thanks to the funding received from the Mellon Foundation, nearly 100 of its rarest out-of-print books will be converted into free EPUB digital editions. The ANS will go one step further by TEI-encoding these editions for online viewing, searching, and linking. Following best-practices of Linked Open Data (LOD), these XML files will link to (and will be able to be linked from) other Open Access (OA) resources in the Humanities, benefiting researchers in history, archaeology, art history, geography, and other disciplines.

“Scholars in the humanities are making increasing use of digital media to access evidence, produce new scholarship, and reach audiences that increasingly rely on such media for information to understand and interpret the world in which they live,” said Earl Lewis, President of the Andrew W. Mellon Foundation.

“Knowledge wants to be free,” Andrew Reinhard, ANS Director of Publications said. “This grant will help the ANS put even more of its collections online for free and open access for anyone who wants it.” The ANS continues its ongoing, longtime commitment to digitization and databases having placed over 600,000 objects online—more than 100,000 of which have been photographed—while contributing tens of thousands of coin records via international projects such as Online Coins of the Roman Empire (OCRE) and PELLA: Coinage of the Macedonian kings of the Argead dynasty. Thanks to the Mellon grant, the ANS can continue to add its publications to this suite of OA materials.

Ethan Gruber, the ANS’s Director of Data Science, said “this is an important project that will enable us to further integrate our numismatic collection, archival materials, and digital library into a cohesive platform to further not only the study of coins, but also the study of the evolution of numismatics.”

“On behalf of the Trustees and staff of the Society, I would like to thank the Andrew W. Mellon Foundation for their generous support of this exciting project,” Ute Wartenberg Kagan, Executive Director of the ANS, said.

The Mellon-funded EPUB and TEI-encoded publications will be available by the end of 2016.

For more information, contact Andrew Reinhard, Director of Publications, at areinhard@numismatics.org.

The full list of works to be made publicly accessible as EBooks through this program is available at https://drive.google.com/file/d/0B0qn_O39OBdmZXZiVjdJZ2pDQjg/view?usp=sharing

Friday, December 4, 2015

The ANS Digital Library, a Look Under the Hood

The ANS announced the launch of its Digital Library few months ago. There are only a few items in the repository at the moment, but we will be expanding in the very near future to include journal articles and open access EBooks. This blog post will introduce some of the technical concepts behind the open source DL framework, ETDPub.

The idea that initially drove our framework was the desire to make numismatic theses and dissertations more widely and freely accessible. Andrew Reinhard, ANS Director of Publications, came to me in the late summer to put together something very basic that we could launch at the INC in Taormina in late September. At first, I looked into an off the shelf tool called Vireo, developed by the Texas Digital Library. However, this platform was designed for the phases of dissertation review and publication into an institutional repository at a university. It is a backend-only with no front-end to speak of. The only solution was to build something effective quickly. The basic specifications for ETD publication were: an interface for basic metadata entry, and upload mechanism for PDFs or other documents, and a front end to provide the public with access to the documents.

Since I've done a lot of XForms development upon library metadata standards in the past, and since nearly all of our applications are already built in XRX/SPARQL design concepts in Orbeon, we opted to use Orbeon for this framework as well. We put together a basic MODS template for electronic theses and dissertations and an XForms editor to handle data entry, document upload, and web service interaction. Like EADitor, xEAC, Numishare, etc. there are lookup mechanisms for the Getty LOD thesauri, Geonames, VIAF, Nomisma.org, Pleiades for ancient geography, and LSCH from the Library of Congress. In even includes lookups for authority records from a xEAC installation (like EADitor). We went from development to production in the first version of the framework in about a week.



Saving the MODS file writes it to an eXist XML database, publishes the metadata to Solr, and indexes the document file into Solr for full-text searching using the ExtractingRequestHandler. Yesterday, I extended the publication functionality to serialize MODS into RDF to post triples in a SPARQL endpoint. This draws content from our Digital Library into our archival platforms built on EADitor and xEAC. We are digitizing auction catalogs, books, and journals edited or authored by prominent numismatic scholars that also played a role in the Society, and therefore have EAC-CPF records in the ANS Biographies service. For example, our Digital Library contains one auction catalog edited by Edgar H. Adams. The metadata from this catalog are published to the SPARQL endpoint, and two items from our archive (an EAD finding aid and a photograph described in MODS) are also available from the biographical page in the Adams authority record. This is the ideal model for larger-scale aggregation of cultural heritage content associated with archival authorities. It is nearly impossible to maintain these connections by hard-coding resourceRelation elements in the EAC-CPF record.

So now we have three standalone software frameworks that comprise our digital library and archive, all connected together via linked open data methodologies. The next step is to begin integrating coins from our collection into this broader network of numismatic information.

We will begin this work soon with the digitization of ANS monographs. These books contain references to coins in our collection, to hoards published on coinhoards.org, to materials in our archive, and to numismatic concepts defined on nomisma.org.

ETDPub already supports the publication of TEI and dynamic serialization of TEI into EPUB 3.0.1.

More details soon.