Navigation: Research / STIXX
Thursday, 2017-05-25

STIXX: Spatio-Temporal Information Extraction and Exploration

Introduction

Spatial and temporal data have become ubiquitous in many application domains such as the Geosciences and life sciences. Sophisticated database management systems are employed to manage such structured data. However, an important source of spatio-temporal information that has not been fully utilized are unstructured text documents.

In the last couple of years, there have been significant advances in the areas of temporal information retrieval and spatial information retrieval, each focusing on extracting and utilizing temporal and geographic data, respectively, from documents for search and exploration tasks. Interestingly, there is only little work that combines models, techniques, and applications from these two areas.

In documents, combinations of temporal and spatial expressions form events. In general, events happen at some specific time at some specific place, i.e., space and time can be seen as two dimensions of events.

Extraction and Organisation of Spatio-Temporal Information

In a series of papers, we proposed a model for the organization of spatio-temporal information (i.e., events) in so-called spatio-temporal document profiles (document event profiles). For this, we developed a UIMA-based text mining pipeline to perform the following steps:

  1. access different kinds of text corpora (e.g., Wikipedia articles),
  2. do some linguistic preprocessing (e.g., sentence splitting, tokenization, and part of speech tagging),
  3. extract and normalize spatial and temporal expressions,
  4. combine spatial and temporal information,
  5. do some kind of final processing (e.g., storing the information in a database or visualization).

 

Spatio-Temporal Information as Two Dimensions of Events

The events stored in spatio-temporal document profiles can be seen as a chain of events that can be ordered chronologically. Thus, the combination of spatio-temporal information results in a trajectory, a so-called document trajectory. These trajectories can then be used to visualize the documents on a map.

Temporal Tagging

For the extraction and normalization of temporal expressions in text documents, we developed HeidelTime, a temporal tagger with which we participated at the TempEval-2 challenge. Here, HeidelTime achieved the best results for both, the extraction and the normalization of temporal expressions from English documents. Details on HeidelTime and our work on temporal tagging can be found here.

Spatio-Temporal Document Exploration 

For the exploration of spatio-temporal information in documents, we developed TimeTrails. Using the UIMA-based text mining pipeline, TimeTrails is a system for the extraction, storage, querying, and exploration of spatio-temporal information embedded in text documents. TimeTrails allows the user to query a document collection using textual, temporal, and spatial constraints, resulting in the visualization of spatio-temporal information extracted from relevant documents as document trajectories, i.e., a map-based view on textual documents. Using the Multiple Document Visualization view (MDV), TimeTrails shows multiple documents at once, allowing the user to explore the documents for spatio-temporal intersections of the document trajectories. Such intersections indicate that same events are described in the documents.

Tools

Publications

Posters

  • Jannik Strötgen and Michael Gertz.

    TimeTrails: A System for Exploring Spatio-Temporal Information in Documents.
    VLDB 2010: 36th International Conference on Very Large Databases, Singapore. [pdf]

  • Jannik Strötgen and Michael Gertz.

    HeidelTime: High Quality Extraction and Normalization of Temporal Expressions.
    SemEval 2010: Fifth International Workshop on Semantic Evaluation (at ACL 2010), Uppsala, Sweden. [pdf]

Contact

For further information, please contact Jannik Strötgen.

 

Letzte Änderung: 17.07.2014