Towards Linking Linguistic Resources in the Humanities

Alexander O'Connor
David Lewis, Felix Sasaki

This research is supported by:
Science Foundation Ireland (Grant 12/CE/I2267) as part of CNGL Centre for Global Intelligent Content at Trinity College Dublin.
And European Commission under the 7th Framework Programme (CENDARI, FALCON)

TCD Library
manuscript description

Cannot make assumptions about the frame of reference.

scholarly primitives
http://flic.kr/p/cA7ggU

Sources for training data

Significant pre-processing required!

Hidden Knowledge


Some example pit-falls from Cendari

http://flic.kr/p/nPi1ga

Can we put some historical documents into a package that is linked and marked up for the purpose of:

Different content—Different markup

How to make it traceable?

What TEI has

What TEI is Missing

Vocabularies

markup

Workflows

workflow

To Sum Up

  1. Source of richer content, with richer tasks
  2. Need to integrate with the overall effort to produce reusable, traceable, marked up materials for training
  3. Particularly need to handle dialects, polyglot documents and secondary documents
  4. Standards exist, workflows are coalescing, but the last steps need to be joined

Data = ƒ(content, markup, quality, provenance)

Get the corpus I want now, but able to get more later