Navigation: Research
Friday, 2017-05-26


Address Database Systems Research Group

Institute of Computer Science
Im Neuenheimer Feld 205
69120 Heidelberg

Phone: +49 (0) 6221 / 54-14350
Fax: +49 (0) 6221 / 54-14351
Email: gertz(at)informatik.uni-heidelberg(dot)de


The database systems group is involved in various multi-disciplinary research acitivites not only with researchers in Computer Science but also researchers in environmental and atmospheric sciences, ecology, climatology, remote-sensing, geology, physics and cosmology, bioinformatics, and many other disciplines. The (former) projects below give a good overview of what type of fundamental and cutting-edge research our group is doing and how data management and database models, techniques, and architectures are applied in respective project areas.

Temporal Tagging

For our work on STIXX, we developed a temporal tagger called HeidelTime. In the TempEval-2 challenge, HeidelTime achieved the best results for the extraction and normalization of temporal expressions from English documents. Currently, HeidelTime supports three languages, English, German, and Dutch, and can be applied on different domains since different normalization strategies are utilized depending on the type of document that is to be processed.
HeidelTime is publicly available as UIMA component and standalone version. You may want to try HeidelTime's online demo or have a closer look on our temporal tagging page.

STIXX: Spatio-Temporal Information Extraction and Exploration

Spatial and temporal data have become ubiquitous in many application domains such as the Geosciences and life sciences. Sophisticated database management systems are employed to manage such structured data. However, an important source of spatio-temporal information that has not been fully utilized are unstructured text documents.
In documents, combinations of temporal and spatial expressions form events. In general, events happen at some specific time at some specific place, i.e., space and time can be seen as two dimensions of events.
In a series of papers, we proposed a model for the organization of spatio-temporal information (i.e., events) in so-called spatio-temporal document profiles. In addition, we developed HeidelTime, a UIMA-based temporal tagger, and TimeTrails, a system for the exploration of text documents using the contained spatio-temporal information as so-called document trajectories.

TWIPA (TWItter rePository mAnager)

Millions of tweets are being continuously crawled and stored into an offline repository for later analysis. Thus,  we end up having a number of huge-volume files containing hundreds of millions of tweets originating from a broad geographic extent. Such tweets  are usually saved in compressed files in order to save space on disk. To put Twitter content under analysis, it is essential to maintain and organize such content using a  database management system enabling users to store tweets, index them, pose queries, and retrieve the resulting records efficiently. For this, we implement TWIPA, a pipeline framework in support of digesting the huge amount of Twitter content  and transferring the demanded tweets from files to a fast-access and indexable database, namely a MongoDB-based database.

Completed Projects

Following is a list of finished projects:

  • Outlier Regions in sensor networks.
  • GeoStreams

Outlier Regions in Sensor Networks

Sensor networks play an important role in applications concerned with environmental monitoring, disaster management, and policy making. Effective and flexible techniques are needed to explore unusual environmental phenomena in sensor readings that are continuously streamed to applications. In this work, we develop models and techniques that allow to detect outlier sensors and to efficiently construct outlier regions from respective outlier sensors. For this, we utilize the concept of degree-based outliers. Compared to the traditional binary outlier models (outlier versus non-outlier), this concept allows for a more fine-grained, context sensitive analysis of anomalous sensor readings and in particular the construction of heterogeneous outlier regions. The latter suitably reflect the heterogeneity among outlier sensors and sensor readings that determine the spatial extent of outlier regions. Such regions furthermore allow for useful data exploration tasks. We demonstrate the effectiveness and utility of our approach using real world and synthetic sensor data streams.


In this NSF funded research project we developed models, techniques, and architectures for the adaptive processing of real-time remotely-sensed, streaming geospatial image data, in particular from the National Oceanic and Atmospheric Administration’s  (NOAA) Geostationary Operational Environmental Satellite (GOES).


COMET Transect COMET: COast-to-Mountain Environmental Transect
This NSF-funded project will develop a state-of-the-art cyberinfrastructure to facilitate climate research in a transect spanning from Bodega Bay to Lake Tahoe. The cyberinfrastructure will be based around the integration of access to distributed and varied data collections and sensor data streams, semantic registration of data, models and analysis tools, semantically-aware data query mechanisms, and an orchestration system for advanced scientific workflows. Access to this cyberinfrastructure will be provided through a Web-based portal. Prof. Dr. Michael led this project until he moved from UC Davis to the University of Heidelberg.


Letzte Änderung: 10.03.2016