An essential component of historical research lies in the analysis of digitized medieval manuscripts which is reaching its technological limits due to lack of digital methods and algorithms for the analysis of these manuscripts. The BMBF-funded joint research project eCodicology uses the library stock of roughly 500 medieval manuscripts which have been written and collected in the library of the Benedictine Abbey of St. Matthias in Trier (Germany). The manuscripts were digitized and enriched with bibliographic metadata within the scope of the project “Virtual Scriptorium St. Matthias” (http://stmatthias.uni-trier.de/). Based on the provided images, the purpose of eCodicology is the development, testing and optimization of new algorithms for the identification of macro- and microstructural layout elements (see figure below) in order to further enrich their metadata.

 

 

 page

 Green: page size,  red: image size,  yellow: text size,  blue: initials.

 

The digitization of the manuscripts and the creation of a metadata schema and models for the XML files according to TEI P5 take place at Trier. After ingesting the digitized images into a data repository, they are processed at Karlsruhe [eCOD1]. Specific algorithms are adapted or designed and developed for the identification of macro- and microstructural layout elements like page size, writing space, number of lines, number of columns, proportion of text and pictorial space etc. The following scientific evaluation and a statistical analysis of the manuscript groups are performed at Darmstadt.

 

collab 

Collaboration within eCodicology.

 

A software framework tied to a web portal will auto­mate the data analysis workflows. It is designed generically to process a great amount of image data with any desired algorithm for feature extraction based on the components ImageJ and MOA/Weka. Assuming a computing time of one minute per page, the one-time processing of the Virtual Scriptorium with a total of 170,000 pages would approximately take four months. Since algorithm development is a highly iterative task it is inevitable to utilize a cluster for data intensive computing. As a result, the hidden relationships of around 500 medieval manuscripts can be detected automatically and a database of objectified, reproducible and at micro level differentiated features will be created.

The software framework itself will be integrated as a service into the DARIAH infrastructure to make it adaptable for a wider range of documents and communities. Thus, eCodicology can show the potential of computer-aided methods by providing algorithms for the automatic and convenient tagging of medieval manuscripts.


[eCOD1]    Tonne, D.; Stotzka, R.; Jejkal, T.; Hartmann, V.; Pasic, H.; Rapp, A.; Vanscheidt, P.; Neumair, B.; Streit, A.; Garcia, A.; Kurzawe, D.; Kalman, T.; Sanchez Bribian, B. & Rybicki, J. Stotzka, R.; Schiffers, M. & Cotronis, Y. (Eds.), A Federated Data Zone for the Arts and Humanities, Proc. of the 20th Internat. Euromicro Conf. on Parallel, Distributed, and Network-Based Processing, 2012, 189 - 207

 

Contact:

KIT. IPE: Swati Chandna, Danah Tonne, Rainer Stotzka  

Uni-Trier: Hannah Busch, Philipp Vanscheidt, Claudine Moulin

TU-DA: Celia Krause, Andrea Rapp 

Copyright by SWM, KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft
Templates Joomla 1.7 by Wordpress themes free