Navigation: Research / Temporal Tagging
Thursday, 2014-08-21

Temporal Tagging

Welcome to the temporal tagging page of the DBS group at Heidelberg University. On this page, you can find:

Papers

In the following, we list our research papers related to temporal tagging

  • Extending HeidelTime for Temporal Expressions Referring to Historic Dates. LREC'14. pdf bibtex
  • Chinese Temporal Tagging with HeidelTime. EACL'14. pdf bibtex
  • Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese. TALIP, 2014. pdf bibtex
  • Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. pdf bibtex
  • HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3. SemEval'13. pdf bibtex
  • Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. LREC'12. pdf bibtex
  • HeidelTime: High Qualitiy Rule-based Extraction and Normalization of Temporal Expressions. SemEval'10. pdf bibtex

HeidelTime

HeidelTime is a multilingual, cross-domain temporal tagger that extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard, which is part of the markup language TimeML (with focus on the "value" attribute). HeidelTime uses different normalization strategies depending on the domain of the documents that are to be processed: news, narratives (e.g., Wikipedia articles), colloquial (e.g., SMS, tweets), and scientific (e.g., biomedical studies). It is a rule-based system and due to its architectural feature that the source code and the resources (patterns, normalization information, and rules) are strictly separated, one can simply develop resources for additional languages using HeidelTime's well-defined rule syntax. Currently, ten languages are supported: English, Spanish, French, German, Dutch, Italian, Arabic, Vietnamese, Chinese, and Russian.

Corpora

In the context of our work on temporal tagging, we developed several temporally annotated corpora:

  • WikiWarsDE: The German counterpart of WikiWars , i.e., containing  Wikipedia documents about several famous wars in history (narrative style documents).
  • WikiWarsVN: The Vietnamese counterpart of WikiWars, i.e., containing Wikipedia documents about several famous wars in history (narrative style documents).
  • AncientTimes: A corpus containing Wikipedia documents about different historic time periods - again war descriptions (narrative style documents).
  • Time4SMS: A corpus containing short messages of the NUS SMS corpus.
  • Time4SCI: A corpus containing scientific abstracts from PubMed about clinical trails.
  • Improved Arabic annotations in the Arabic ACE 2005 data set.
  • Improved Chinese annotations in the TempEval2 Chinese data set.

The corpora (as well as preparation scripts) are available on our download page. For further information on the corpora, see the respective paper.

Reproducing HeidelTime's Evaluation Results

In our papers, we evaluated HeidelTime on many corpora. All the reported evaluation results are reproducible. A description how to do this, can be found on the HeidelTime Google Code project site

 

Letzte Änderung: 27.05.2014