Scientific Motivation

Analysis of cell movement during vertebrate embryogenesis requires a microscopy apparatus capable of 3D image recording with high spatial and temporal resolution. To achieve an individual cell tracking over the entire embryogenesis it is necessary to keep the high acquisition data rate constant over prolonged time periods of 24 hours or more, see survey in [LSM1]. Moreover, for the correct data interpretation it is important to have good statistics on many biological samples.

zebrafish

Maximum intensity projection of an image stack depicting a zebrafish embryo at 24 hours post fertilization.


Therefore, a novel optical system based on digital scanned laser light-sheet microscope has been built at the APH KIT Campus South. The microscope is capable of image recording with the speed exceeding 108 voxels/s. Thus, the full 3D stack, covering the field of view of 1039´876 µm2 laterally and 1000 µm axially with the resolution of 2560 and 2160 pixels and 500 frames, respectively, is acquired within 25 seconds. This image acquisition rate was kept stable for more than 24 hours, while the total acquisition time of a dataset was limited only by the available local disk storage space of 16 TB.

Recently, the microscope has been utilized to record more than 24 datasets within six weeks of the first 12-16 hours of zebrafish embryo development with the total data size above 250 TB. To achieve high sample throughput recorded data were transferred after every measurement to the LSDF (KIT Campus North) over the 10 GE network with the rate exceeding 400 MB/s. To manage such high data rate a special protocol, gridFTP, optimized for the fast transfer of big files has been used (see also next page).

 

Data Analysis

Due to tremendous amount of microscopy data time-efficient algorithms for the automated cell nuclei identification has been developed at the IAI KIT Campus North. Thus, by utilizing high parallelization during segmentation a complete data set of 10 TB from a single experiment can be processed in less than 24 hours on the Apache Hadoop computer cluster (KIT Campus North).

 Zebrachain

Data analysis workflow.


Data Ingest Client

As the LSM data is too large to be stored locally at the experiment, it has to be ingested efficiently into the Large Scale Data Facility (LSDF). The Data Ingest Client automates the ingest process by collecting the data, providing additional (administrative and content) metadata and transferring the data using the Abstract Data Access Layer API (ADALAPI).

 

ADALAPI

The ADALAPI is a library implemented by the KIT/IPE to transfer data supporting multiple protocols.

Features:

  • Authentication
  • Multi-Protocol Support
    • GridfTP
    • GSI-SCP
    • WebDAV
    • HDFS
  • Rebuild Canceled Transfers


In case of LSM GridFTP is used due to the big data files and the required high transfer rate. An average transfer rate of more than 400 MB/s is reached for all data. In total more than 250 TB of data have been transferred to the LSDF in six weeks.

Xfer

Data transfer per day over a period of six weeks.

 

Volume Visualization

Directly after the data ingest a workflow is started automatically to compute a 3D visualization of the temporal development of a zebrafish embryo. For the visualization each image stack V(t) of the 16 TB dataset has to be accessed, scaled, rotated, and a maximum intensity projection is computed. On a single workstation this process requires approximately 100 hours of computing time. A parallel computing infra­structure based on HADOOP and attached directly to the LSDF delivers the computing and data throughput capabilities. The result is a movie with less than 50 MB allowing a quick visual quality check of the experiment.

VolViz

Rotation and projection of a time-dependent image stack V(t).

 

Data Access and Generic Search 

Identification of a single dataset hidden in large storage and archiving systems requires performant search tools using metadata. All data stored in the LSDF is linked to metadata. Three different types of metadata exist:

 

  • Administrative metadata
  • Content metadata
  • Structural metadata

 

As the administrative metadata has a similar structure for all data stored in the LSDF, a simple search for this type has been implemented as a first step. Content specific metadata is extracted automatically from the data, stored and propagated via OAI-PMH to a search engine.


[LSM1]     Mikut, R.; Geurts, P.; Hamprecht, F.; Kausler, B.X.; Marée, R.; Mikula, K.; Pantazis, P.; Ronneberger, O.; Stotzka, R.; Strähle, U. & Peyriéras, N., Automated processing of zebrafish imaging data - a survey, Zebrafish, 2013


 

Contact:

KIT, IPE: Volker Hartmann, Francesca Rindone, Thomas Jejkal, Rainer Stotzka

KIT, APH: Andrei Kobitski, G. Ulrich Nienhaus

KIT, ITG: Jens C. Otte, Masanari Takamiya, Uwe Strähle

KIT, IAI: Johannes Stegmaier, Ralf Mikut                  

Copyright by SWM, KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft
Templates Joomla 1.7 by Wordpress themes free