Due to new digital methodologies and technologies the humanities research process has become more and more data-intensive in recent years. DARIAH (Digital Research Infrastructure for the Arts and Humanities) was therefore initiated to build up a sustainable research infrastructure for direct support of the scholars. DARIAH is one of the few humanities and socio-scientific projects placed on the ESFRI roadmap (European Strategy Forum on Research Infrastructures) and is based on national contributions from the participating countries. The German part DARIAH-DE is led by Göttingen State and University Library and unites 17 partners including various humanities disciplines, computer sciences and data centers.




DARIAH covers a wide range of topics, from the core infrastructure with generic or community-specific services to ontologies, meta data and license recommendations to training and edu­cation. An essential component of the infrastructure is a long-term storage serving the wide variety of disciplines in the digital humanities and accounting for their special requirements. The data from various projects and scholars differs in size (a few kilobytes for a text file containing a letter or several gigabytes for a film record of an opera), quantity (a few image files of a rare and valuable manuscript up to several millions of image files of a whole library) and type as there is a variety of different formats for text, image, audio, and movies. In addition to the data itself the scholars’ diverse expectations need to be considered while designing and implementing services which support storing their data as convenient as possible.


Overview of the DARIAH Storage Architecture.

DARIAH provides a couple of stand-alone services usable for a sustainable storage of research data. The Storage Service [DARIAH1] offers a long-term and redundant file storage and ensures data integrity on bit-stream level. An identifier for this data needs to remain stable to be for instance cited in scientific publications, the Persistent Identifier Service therefore provides methods to create persistent identifiers (PIDs) and assign them to the data. As not all data is open to the public, for instance if personal data is contained, it has to be carefully protected from unauthorized access by the integration of an Authentication and Authorization Infrastructure (AAI). Dawa, the Data Web Application (see previous page) is the first service integrating these components.


Due to the heterogeneous research data and the long period of usage it is essential to create a modular infrastructure addressing various services on demand. Including standards and standardized interfaces simplifies future adoptions and exchange of underlying services. Nonetheless it is a prevailing field of research how to provide a sustainable and comprehensive research infrastructure for the arts and humanities.

dawa - Data Web Application

In the last few decades the production of data increased in nearly every research institution due to growing technical capabilities. In view of ever-increasing data amounts, in most different communities the question where to store big data in an easy, but secure way arose. For an easy–to-handle and secure long-term storage, a user-friendly web interface, known as Data Web Application dawa, has been implemented for the DARIAH project.

 The Data Web Application is a Vaadin-based web application which can be integrated as a portlet in various Liferay portals.

In the context of dawa a digital object describes an object composed by data and descriptive metadata. Compared to the data, the descriptive metadata is not a mandatory component of a digital object. Thus, the simplest form of a digital object is a data file. For a systematic and detailed search for digital objects via higher level services, the adding of metadata is recommended. 

The two main functionalities are the ingest and the download of digital objects to and from a storage system. Basis for using dawa as a client service in front of a storage system is a RESTful storage API which allows the communication between dawa and the storage system. A running dawa version, using the DARIAH Storage API [DARIAH1] as storage service, is currently available in the DARIAH Developer Portal.

The ingest process, executable on dawa’s tab sheet “Ingest” (see figure), is defined by three steps:

1. Creation of an digital object by uploading data (and corresponding descriptive metadata).
2. Ingest to the storage system via the integrated storage service.
3. Assignment of a persistent identifier (PID, Handle service, GWDG, Göttingen, Germany) to the ingested digital.

The download of digital objects can be executed on dawa’s tab sheet “Download”. Via the PID all infor­mation about a digital object can be requested.



KIT, IPE: Danah Tonne, Rainer Stotzka, Francesca Rindone    

Copyright by SWM, KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft
