Dennis Aumiller

Phone: +49 (0) 6221 / 54 - 14353
Fax: +49 (0) 6221 / 54 - 14351
Office: INF 205, room 1/312 (first floor)
Email: aumiller(at)informatik.uni-heidelberg(dot)de
Office hours: By appointment

News

2023-07-25: After concluding my time as a research assistant in the Data Science Group, I will be joining Cohere's Data and Evaluation team as a Member of Technical Staff in September!

2023-05-16: Our paper "Evaluating Factual Consistency of Texts with Semantic Role Labeling" has been accepted at the The 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)! I will be presenting our insights in person at the conference, held jointly with ACL 2023 in Toronto, Canada.

2023-01-18: A new long paper, titled "On the State of German (Abstractive) Text Summarization" has been accepted at the 20th Conference on Database Systems for Business, Technology and Web (BTW'23).

2022-11-01: Our submission to the shared task on Lexical Simplification at the Text Simplification, Accessibility and Readability Workshop (TSAR-2022) obtained the highest scores across all systems! The paper describing our approach will be presented at the TSAR Workshop, co-located with EMNLP'22. A pre-print is available on arXiv.

2022-10-06: Our work Eur-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain has been accepted to the main track of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP'22)! A pre-print and the public version of our dataset are available now.

2022-04-04: More great news! The previously announced pre-print Klexikon: A German Dataset for Joint Summarization and Simplification has been accepted with oral presentation at the 13th Conference on Language Resources and Evaluation (LREC'22)! Similarly, I am looking forward to attending a physical conference again. Feel free to reach out and let me know if you are around Marseilles in late June.

2022-03-31: Our demonstration paper Online DATEing: A Web Interface for Temporal Annotations has been accepted at the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'22)! You can check out a system demonstration online. I will be attending the conference in person, so feel free to reach out if you want to have a coffee in Madrid this summer!

2022-03-01: Our short paper Time for some German? Pre-Training a Transformer-based Temporal Tagger for German has been accepted at the Fifth International Workshop on Narrative Extraction from Texts (Text2Story@ECIR'22)! You can also check out our framework on GitHub.

2022-01-18: A pre-print of a newly released data resource Klexikon: A German Dataset for Joint Summarization and Simplification has been published on arXiv! See here for the paper, or check out the code on GitHub.

2021-07-01: I will be joining Amazon Research Berlin as an Applied Scientist Intern later this year.


About

My main interests are focused on Natural Language Processing on large document collections. Specifically, I investigate suitable models for (multi-)document summarization, particularly on collections that work with domain-specific texts or non-English languages. A secondary focus lies on the incorporation of user-specified aspects during the generation process of summaries, which can enable a wider audience to benefit from the (currently rather limited) applicability of summarization models.
We also previously experimented with other forms of text exploration, including the visualization of information that allow for improved content density, or the re-structuring of document sections in a more (temporally) consistent manner. I strive to ensure our research stays reproducible; as a commitment, most of our methods and artifacts can be found on my Github profile, or on the Huggingface Hub.

I previously studied Applied Computer Science with a minor in Computational Linguistics, also at Heidelberg University and finished my Master of Science in May 2019. Afterwards, I spent four wonderful years as part of the Data Science Group group at the Institute of Computer Science, under the supervision of Prof. Dr. Michael Gertz.


Reviewing Activities


Teaching & Supervision

Courses:

  • Lecture Assistant for "Data Science for Text Analytics" (Winter 2022)
  • Lecture Assistant for graduate course "Text Analytics" (Winter 2020)
  • Lecture Assistant for "Databases 1" (Summer 2019, Summer 2020, Summer 2021, Summer 2022)
  • Head Teaching Assistant for graduate course "Complex Network Analysis" (Winter 2018)
  • Teaching Assistant for "Databases 1" (Summer 2016, Summer 2017)
  • Head Teaching Assistant for graduate course "Computer Graphics" (Winter 2016, with Prof. Dr. Filip Sadlo)

Student Practicals:

  • continual supervision of (under-)graduate semester research projects, since Summer 2019

Co-supervised Seminars:

  • "Modern Information Retrieval" (Summer 2023)
  • "Domain-Specific Question Answering" (Summer 2022)
  • "Knowledge Graphs and NLP" (Summer 2021)
  • "Trends and Topics in Text Analytics" (Summer 2020)

Supervised Master Theses (co-supervision with Michael Gertz):

  • Jiahui Li: "Styled Text Summarization via Domain-Specific Paraphrasing" (Summer 2023)
  • Fabio Becker: "A Generative Model for Dynamic Networks with Community Structures" (Winter 2020)

Supervised Undergraduate Theses (co-supervision with Michael Gertz):

  • Jing Fan: "Assessing Factual Accuracy of Generated Text" (Winter 2022)
  • Mateusz Chrzastek: "Extractive Keyphrases from Noun Chunk Similarity" (Winter 2021)
  • Jan-Gabriel Mylius : "Visual Analysis of Paragraph Similarity" (Winter 2020)
  • Stefan Hickl: "Automatisierte Generierung von Inhaltsverzeichnissen aus PDF-Dokumenten" (Summer 2020)

Faculty Responsibilities:

  • Member of the "Prüfungsausschuss Informatik" (since October 2022)


Research Interests

  •     (Multi-)Document Summarization
  •     Text Simplification
  •     Temporal Tagging
  •     Keyphrase Extraction
  •     (Temporal) Information Retrieval
  •     Broader (German) Natural Language Processing / Machine Learning


Publications

2023

  • Dennis Aumiller, Jing Fan, and Michael Gertz.
    On the State of German (Abstractive) Text Summarization.
    In: Birgitta König-Ries, Stefanie Scherzinger, Wolfgang Lehner, and Gottfried Vossen (eds.), Datenbanksysteme für Business, Technologie und Web (BTW 2023), 20. Fachtagung des GI-Fachbereichs „Datenbanken und Informationssysteme" (DBIS), 06.-10, März 2023, Dresden, Germany, Proceedings P-331. 2023, 195–220
    [code] [paper] [DOI:10.18420/BTW2023-10]
  • Jing Fan, Dennis Aumiller, and Michael Gertz.
    Evaluating Factual Consistency of Texts with Semantic Role Labeling.
    In: Proceedings of the The 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023). 2023, 89–100
    [code] [paper] [online]

2022

  • Satya Almasian, Dennis Aumiller, and Michael Gertz.
    Time for some German? Pre-Training a Transformer-based Temporal Tagger for German.
    In: Ricardo Campos, Alípio Mário Jorge, Adam Jatowt, Sumit Bhatia, and Marina Litvak (eds.), Proceedings of Text2Story - Fifth Workshop on Narrative Extraction From Texts co-located with 44nd European Conference on Information Retrieval, Text2Story@ECIR 2022, Stavanger, Norway, April 10th, 2022 3117. 2022, 83–90
    [online] [code] [CEUR]
  • Aumiller, Dennis, Chouhan, Ashish, and Gertz, Michael.
    EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain.
    In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 7626–7639
    [arXiv] [code] [online]
  • Dennis Aumiller, Satya Almasian, David Pohl, and Michael Gertz.
    Online DATEing: A Web Interface for Temporal Annotations.
    In: Enrique Amigó, Pablo Castells, Julio Gonzalo, Ben Carterette, J. Shane Culpepper, and Gabriella Kazai (eds.), SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022. 2022, 3289–3294
    [online] [code] [DOI:10.1145/3477495.3531670]
  • Aumiller, Dennis, and Gertz, Michael.
    Klexikon: A German Dataset for Joint Summarization and Simplification.
    In: Proceedings of the Language Resources and Evaluation Conference. 2022, 2693–2701
    [arXiv] [code] [dataset] [online]
  • Dennis Aumiller, and Michael Gertz.
    UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?.
    In: Proceedings of the 1st Workshop on Text Simplification, Accessibility and Readability (TSAR-2022). 2022
    [code] [online]
  • Michael Gertz, and Dennis Aumiller.
    Deep Learning und Legal Tech – Eine Bestandsaufnahme.
    In: LegalTech - Zeitschrift für die digitale Rechtsanwendung 1:1. 2022, 30–36
    [online]

2021

  • Dennis Aumiller, Satya Almasian, Sebastian Lackner, and Michael Gertz.
    Structural Text Segmentation of Legal Documents.
    In: Eighteenth International Conference for Artificial Intelligence and Law (ICAIL'21), June 21–25, 2021, Sāo Paulo, Brazil. 2021
    [online] [code] [DOI:10.1145/3462757.3466085]

2020

  • Dennis Aumiller, Satya Almasian, Philip Hausner, and Michael Gertz.
    UniHD@CL-SciSumm 2020: Citation Extraction as Search.
    In: Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard H. Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, and Michal Shmueli-Scheuer (eds.), Proceedings of the First Workshop on Scholarly Document Processing, SDP@EMNLP 2020, Online, November 19, 2020. 2020, 261–269
    [online] [acl]
  • Philip Hausner, Dennis Aumiller, and Michael Gertz.
    TiCCo: Time-Centric Content Exploration.
    In: Mathieu d'Aquin, Stefan Dietze, Claudia Hauff, Edward Curry, and Philippe Cudré-Mauroux (eds.), CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020. 2020, 3413–3416
    [online] [demo] [code] [DOI:10.1145/3340531.3417432]
  • Philip Hausner, Dennis Aumiller, and Michael Gertz.
    Time-centric Exploration of Court Documents.
    In: Ricardo Campos, Alípio Mário Jorge, Adam Jatowt, and Sumit Bhatia (eds.), Proceedings of Text2Story - Third Workshop on Narrative Extraction From Texts co-located with 42nd European Conference on Information Retrieval, Text2Story@ECIR 2020, Lisbon, Portugal, April 14th, 2020 [online only] 2593. 2020, 31–37
    [CEUR]
  • Andreas Spitz, Dennis Aumiller, Bálint Soproni, and Michael Gertz.
    A Versatile Hypergraph Model for Document Collections.
    In: Proceedings of the 32nd International Conference on Scientific and Statistical Database Management (SSDBM '20), Vienna, Austria, July 7-9. 2020
    [pdf] [DOI:10.1145/3400903.3400919]

2019

  • Martin Würtz, Dennis Aumiller, Lina Gundelwein, Philipp Jung, Christian Schütz, Kathrin Lehmann, Katalin Tóth, and Karl Rohr.
    DNA accessibility of chromatosomes quantified by automated image analysis of AFM data.
    In: Scientific Reports 9 (1). 2019
    [online] [code] [DOI:10.1038/s41598-019-49163-4]