perseus greek & latin treebanks


Anna-Sophia Zingarelli-Sweet

University of Pittsburgh School of Information Sciences

LIS 2975: Digital Scholarship

16 October 2013

anz31@pitt.edu // @aszingarelli

what is a treebank?

what is dependency structure?

  • modern syntactic theory
  • finite verb as structural center
  • all other words "depend" on the verb = 
  • described in relationship to the verb
  • Lucien Tesnière, Éléments de syntaxe structurale, pub. posthumously 1959

The ancient greek and latin dependency treebanks

"are an attempt to create a linguistic genome: a large database of Classical texts where the morphological, syntactic, and lexical information for each sentence has been explicitly encoded.

The point? To put linguistic research in Greek and Latin on a new quantitative foundation. To help drive a new generation of computational analysis. And above all, to get students and faculty both involved in the production of data that can be useful to the wider scholarly community."


  • Labor intensive: 200+ researchers annotating 350,000+ words (and this is a tiny fraction of the corpus)
  • "standard" production method: 2 researchers annotate independently, then a 3rd reconciles differences
  • "scholarly" production method: single researcher "publishes" their own annotation
  • XML files all available under Creative Commons

Perseus Digital library

  • begun 1987
  • contains 3.4 million words of Latin and 4.9 million words of Greek
  • public domain texts w/ OCR and XML encoded

  • Treebanks both benefit from this corpus & provide new services for the library
  • recommender service offers most likely interpretation

machine translation

  • Treebanks allow for extraction of rules
  • Use known translations to map parallel trees
  • Train program to accurately map out unknown tranlslations
  • (Technical description of this process found in Gideon Kotze et al, "Large Aligned Treebanks for Syntax-based Machine Translation", in proceedings of the International Conference on Language Resources and Evaluation 2012

Perseus Treebanks

By Anna-Sophia Zingarelli-Sweet

Perseus Treebanks

  • 1,386