Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project

Posted on 12 November 2012

Talk: Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".


Date: Tuesday, 20 November 2012

Time: 17:00-18:30

Venue: TOPOI-Haus Dahlem, Hittorfstr. 18, 14195 Berlin (map).

Poster: Download the PDF here.


The Hellespont Project (German Archaeological Institute and Tufts University) aims to integrate two of the largest collections for the study of Antiquity, the Perseus Digital Library and the Arachne archaeological database, in a dynamic digital research environ- ment. The historians will have access to materials and resources of heterogeneous type, like i.e. ancient texts, archaeological evidence, historical background, and modern schol- arly literature [7], while the documents related to each single event will be interconnected through the CIDOC-CRM model.

By focusing on a limited time period (the history of Athens in the years 479-430 BCE) and one written source (the relevant chapters in the work of the Greek historian Thucy- dides: 1.79-118) for a first case study, the project will also tackle a decisive problem, that has so far received comparatively limited attention. Digital Historians have been concerned with the problem of how documents can be organized into digital sourcebooks that can enable more powerful and meaningful queries (e.g. [6]). But how must the com- plex information transmitted by the Ancient Greek literary texts be structured, so as to highlight the event grid that underlies the narration and to get a more direct access to the passages that are strictly relevant to each topic? The problem is particularly important for the text of the ancient historians, which are a source of incalculable value for the history of Antiquity. Yet, the events that are narrated in their works are not only expressed through unstructured natural language, which is in itself already difficult to parse [4]; very often, they are reshaped according to constraints of ideology and genre.

We propose to use some of the methods of current computational linguistics to ad- dress this issue. In particular, we will explore how we can take advantage of the available annotated syntactic corpora (especially the Ancient Greek Dependency Treebank, [1]) and upgrade their model with supplementary annotation. Hellespont Project aims to enrich the text of Thucydides with word-by-word linguistic annotation on morphology, syntax, valency frame and other discursive features such as semantic roles, verbal as- pect, anaphora resolution and topic-focus articulation. This task is made possible by adopting the four-level scenario of the Prague Dependency Treebank of Czech [2]. The so called “tectogrammatical” annotation can provide an outline of the event structure underlying the narration of Thucydides. Semi-automatic linguistic annotation will also be the foundation for the first experiment of a completely data-driven event extraction from the text of an ancient author, following the path of other recent projects in Digital Humanities (e.g. [5, 3]).

Moreover, a fine-grained linguistic analysis of an ancient historical work is not only useful for information representation. A text annotated with syntactic and semantic information can allow for a multitude of linguistic and literary studies that can help us in understanding this masterpiece of historical prose.


[1] David Bamman, Francesco Mambrini, and Gregory Crane. An ownership model of annotation: The Ancient Greek Dependency Treebank. In Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT 8), pages 5–15, Milan, 2009. EDUCatt.

[2] Alena Bo ̈hmova ́, Jan Hajiˇc, Eva Hajiˇcova ́, and Barbora Hladka ́. The Prague Depen- dency Treebank: A three-level annotation scenario. In Anne Abeill ́e, editor, Tree- banks: Building and Using Syntactically Annotated Corpora, pages 103–127. Kluwer Academic Publishers, Boston, 2001.

[3] Agata Cybulska and Piek Vossen. Historical event extraction from text. In Proceed- ings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 39–43, Portland, Oregon, USA, June 2011.

[4] F. Hogenboom, F. Frasincar, U. Kaymak, and F. de Jong. An overview of event extraction from text. In DeRiVE 2011. Detection, Representation, and Exploitation of Events in the Semantic Web Proceedings of the Workhop on Detection, Represen- tation, and Exploitation of Events in the Semantic Web, page 48–57, Bonn, 2011. CEUR.

[5] Nils Reiter, Oliver Hellwig, Anand Mishra, Irina Gossmann, Borayin Larios, Julio Rodrigues, Britta Zeller, and Anette Frank. Adapting standard NLP tools and re- sources to the processing of ritual descriptions. In Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Hu- manities (LaTeCH 2010), pages 39–46. Faculty of Science, University of Lisbon, 2010.

[6] Bruce Robertson. Exploring historical RDF with Heml. Digital Humanities Quar- terly, 3(1), 2009.

[7] Matteo Romanello and Agnes Thomas. The world of Thucydides : From texts to artefacts and back. In CAA proceedings 2012, pages 276–284. Amsterdam University Press, 2012.


Or you can download the slides from here.