Temporal Tagging

Welcome to the temporal tagging page of the DBS group at Heidelberg University.

Stay up-to-date

register on our mailing list for notifications
follow us on Twitter

HeidelTime


HeidelTime is a multilingual, cross-domain temporal tagger that extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard, which is part of the markup language TimeML (with focus on the "value" attribute).

HeidelTime uses different normalization strategies depending on the domain of the documents that are to be processed: news, narratives (e.g., Wikipedia articles), colloquial (e.g., SMS, tweets), and scientific (e.g., biomedical studies). It is a rule-based system and due to its architectural feature that the source code and the resources (patterns, normalization information, and rules) are strictly separated, one can simply develop resources for additional languages using HeidelTime's well-defined rule syntax.

Currently, 13 languages are supported with manually developed resources: English, Spanish, French, German, Dutch, Italian, Arabic, Vietnamese, Chinese, Russian, Croatian, Portuguese and Estonian.

In addition, starting with version 2.0, automatically created resources for more than 200 languages are available. Obviously, these are of lower quality than the manually hand-crafted resources, but since temporal tagging of many languages has never been addressed before, HeidelTime can be used as baseline or as starting point for further improvements for these 200+ languages.

Papers


In the following, we list our research papers related to temporal tagging

  • A Baseline Temporal Tagger for All Languages. EMNLP'15. pdf bibtex
  • HeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML's Empty Tags. EVALITA'14. pdf bibtex
  • Extending HeidelTime for Temporal Expressions Referring to Historic Dates. LREC'14. pdf bibtex
  • Chinese Temporal Tagging with HeidelTime. EACL'14. pdf bibtex
  • Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese. TALIP, 2014. pdf bibtex
  • Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. pdf bibtex
  • HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3. SemEval'13. pdf bibtex
  • Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. LREC'12. pdf bibtex
  • HeidelTime: High Qualitiy Rule-based Extraction and Normalization of Temporal Expressions. SemEval'10. pdf bibtex

Corpora


In the context of our work on temporal tagging, we developed several temporally annotated corpora:

  • WikiWarsDE: The German counterpart of WikiWars, i.e., containing  Wikipedia documents about several famous wars in history (narrative style documents).
  • WikiWarsVN: The Vietnamese counterpart of WikiWars, i.e., containing Wikipedia documents about several famous wars in history (narrative style documents).
  • AncientTimes: A corpus containing Wikipedia documents about different historic time periods - again war descriptions (narrative style documents).
  • Time4SMS: A corpus containing short messages of the NUS SMS corpus.
  • Time4SCI: A corpus containing scientific abstracts from PubMed about clinical trails.
  • Improved Arabic annotations in the Arabic ACE 2005 data set.
  • Improved Chinese annotations in the TempEval2 Chinese data set.

The corpora (as well as preparation scripts) are available on our download page. For further information on the corpora, see the respective paper.

Reproducing HeidelTime's Evaluation Results

In our papers, we evaluated HeidelTime on many corpora. All the reported evaluation results are reproducible. A description how to do this, can be found on the HeidelTime GitHub project site.