Wednesday, March 5, 2014

Linking EAC-CPF Occupations to the Getty AAT

The occupation element in xEAC now supports a SPARQL-based lookup mechanism to link EAC-CPF records to terms defined in the newly-released linked open data Getty AAT.

I won't go into great detail about how this works in the back end, because it is basically identical to the process by which I hooked EADitor into the AAT with EAD genreform elements, which I covered in a blog post last month.

One thing to note, however, is that the xEAC occupation lookup filters for terms that contain "Agents Facet" in the gvp:parentStringAbbrev property. There are different categories of terms--object types, agents, stylistic periods, etc.--that are not semantically distinguished, but at least contain a string in a generic field which allows filtering. I hope that the Getty will move forward with a more formal representation of these facets to improve querying efficiency.

Therefore queries for occupations look something like this:

SELECT ?c ?label WHERE {
?c rdf:type gvp:Concept .
?c skos:inScheme aat: .
?c skos:prefLabel ?label .
?c luc:term "president" .
?c gvp:parentStringAbbrev ?facet 
FILTER regex(?facet, "Agents Facet") 
FILTER langMatches(lang(?label), "en")}
ORDER BY ASC(?label)
LIMIT 25

I plan to apply these filters to the LOD thesaurus editor for kerameikos.org in order to provide a more accurate list of style periods, pottery techniques, wares, and shapes for linking kerameikos URIs to Getty AAT identifiers. For example, "Black Figure" is defined by the Getty as both a technique and a style or period, so "Black Figure" on kerameikos, defined by http://kerameikos.org/ontology#Technique, should refer to the Getty's technique facet (not the style or period) for the term with owl:sameAs.

2 comments:

  1. Ethan, you must be kidding ;-) Of course there's a much faster way to get what you need: gvp:broaderTransitive. If you don't RTFM, explore the data but check that Inference dropdown!

    1. Look at eg "presidents" with inference
    http://vocab.getty.edu/aat/300025470?inference=all

    2. Notice gvp:broaderTransitive aat:300024978, aat:300024979, aat:300024980, aat:300025426, aat:300025427, aat:300025432, aat:300264089 (yay!)

    3. How to pick the best root? Click on the Hierarchy tab

    4. aat:300024979 "people (agents)" or aat:300024980 looks good

    5. use gvp:prefLabelGVP instead of lang "en" because some may NOT have an "en" prefLabel

    6. Don't forget the wildcard, or you'll miss presidents' wives :-)

    7. Rewriten query (nicer, eh?):
    SELECT * {
    ?c a gvp:Concept; skos:inScheme aat: ;
    gvp:broaderTransitive aat:300024980 ;
    gvp:prefLabelGVP/xl:literalForm ?label ;
    luc:term "president*"
    } ORDER BY ?label LIMIT 25

    8. Would be much obliged if you can enlarge the Post Comment box here

    ReplyDelete
  2. 9. Consider NOT ordering by ?label, since luc:term natively returns them in relevance order:
    • Order by ?label: first ladies, presidents, vice-presidents
    • by Lucene relevance: presidents, vice-presidents, first ladies

    Thanks for this delightful use case! Added query to doc

    ReplyDelete