Dr. Erich Schubert


Phone: +49 (0) 6221 / 54 - 14353
Fax: +49 (0) 6221 / 54 - 14351
Office: INF 205, room 1/312 (first floor)
Email: schubert(at)informatik(dot)uni-heidelberg(dot)de
Office hours during semester: by appointment

Sorry: no international internships. Please understand that I will not even answer.
Only students enrolled at Heidelberg can apply for my Practicals and Thesis Topics.

News

2017-04-01: I will be offering the lecture Advanced Topics in Text Mining (as 2+1 class) in summer term.
2017-03-13: Accepted for publication at TODS: E. Schubert and J. Sander and M. Ester and H.-P. Kriegel and X. Xu. DBSCAN Revisited, Revisited: Why and how you should (still) use DBSCAN.
2017-01-26: Accepted for publication at VLDB: G. Casanova and E. Englmeier and M. E. Houle and P. Kröger and M. Nett and E. Schubert and A. Zimek. Dimensional Testing for Reverse k-Nearest Neighbor Search.
2016-10-24: The SISAP 2016 proceedings are online as: L. Amsaleg, M. E. Houle, E. Schubert: Similarity Search and Applications - 9th International Conference, SISAP 2016, Tokyo, Japan, October 24-26, 2016. Proceedings. Lecture Notes in Computer Science 9939
2016-10-08: Accepted for publication at KAIS: H.-P. Kriegel, E. Schubert, A. Zimek: The (Black) Art of Runtime Evaluation: Are We Comparing Algorithms or Implementations?
2016-09-01: I joined the database system research group @ Heidelberg.

About

I did my PhD in the database systems group at the Ludwig-Maximilians-Universität München before I joined the Database Systems Research group of Prof. Dr. Michael Gertz as a Post-Doc. My thesis was on generalizing outlier detection, and I did some research on change detection on large-scale textual data streams.

I am a lead author of the ELKI data mining toolkit.

Research Interests

  • Data Mining & Text Mining
  • Event Detection and Analysis
  • Clustering and Outlier Detection
  • Information Retrieval & Information Extraction
  • Network Analysis & Graph Algorithms
  • Machine Learning
See also: Google ScholarDBLPORCID  – ACM Digital LibrarySemantic ScholarAminer

Publications

2017

  • Guillaume Casanova, Elias Englmeier, Michael E. Houle, Peer Kröger, Michael Nett, Erich Schubert, and Arthur Zimek.
    Dimensional Testing for Reverse k-Nearest Neighbor Search.
    In: Proceedings of the VLDB Endowment 10 (7). 2017, 769–780
    [pdf] [bibtex]
  • Erich Schubert, Jörg Sander, Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu.
    DBSCAN Revisited, Revisited: Why and how you should (still) use DBSCAN.
    In: ACM Transactions on Database Systems (TODS). 2017, accepted for publication

2016

  • Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle.
    On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen. 2016
    [abstract (pdf)] [slides (pdf)] [poster (pdf)] [data and results]
  • Laurent Amsaleg, Michael E. Houle, and Erich Schubert (eds.).
    Similarity Search and Applications - 9th International Conference, SISAP 2016, Tokyo, Japan, October 24-26, 2016. Proceedings.
    Lecture Notes in Computer Science 9939. 2016
    [conference homepage] [DOI:10.1007/978-3-319-46759-7] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    SPOTHOT: Scalable Detection of Geo-spatial Events in Large Textual Streams.
    In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management (SSDBM), Budapest, Hungary. 2016, 8:1–8:12
    [preprint (pdf)] [DOI:10.1145/2949689.2949699] [bibtex]
  • Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle.
    On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study.
    In: Data Mining and Knowledge Discovery 30 (4). 2016, 891–927
    [authorized access (Springer)] [data and results] [DOI:10.1007/s10618-015-0444-8] [bibtex]
  • Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    The (black) art of runtime evaluation: Are we comparing algorithms or implementations?.
    In: Knowledge and Information Systems (KAIS). 2016, 1–38
    [authorized access (Springer)] [DOI:10.1007/s10115-016-1004-2]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    Scalable Detection of Emerging Topics and Geo-spatial Events in Large Textual Streams.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen. 2016
    [abstract (pdf)] [slides (pdf)] [poster (pdf)]

2015

  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles.
    In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA), Hanoi, Vietnam. 2015, 19–36
    [preprint (pdf)] [slides (pdf)] [code] [DOI:10.1007/978-3-319-18123-3_2] [bibtex]
  • Erich Schubert, Michael Weiler, and Arthur Zimek.
    Outlier Detection and Trend Detection: Two Sides of the Same Coin.
    In: 1st International Workshop on Event Analytics using Social Media Data at the 15th IEEE International Conference on Data Mining (ICDM), Atlantic City, NJ. 2015, 40–46
    [preprint (pdf)] [DOI:10.1109/ICDMW.2015.79] [bibtex]
  • Erich Schubert, Alexander Koos, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, and Arthur Zimek.
    A Framework for Clustering Uncertain Data.
    In: Proceedings of the VLDB Endowment 8 (12). 2015, 1976–1979
    [ELKI] [pdf] [DOI:10.14778/2824032.2824115] [bibtex]
  • Erich Schubert, and OpenStreetMap Contributors.
    Fast Reverse Geocoder using OpenStreetMap data.
    Open Data LMU. 2015
    [code] [data] [DOI:10.5282/ubm/data.61]

2014

  • Xuan Hong Dang, Ira Assent, Raymond T. Ng, Arthur Zimek, and Erich Schubert.
    Discriminative Features for Identifying and Interpreting Outliers.
    In: Proceedings of the 30th International Conference on Data Engineering (ICDE), Chicago, IL. 2014, 88–99
    [preprint (pdf)] [DOI:10.1109/ICDE.2014.6816642] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds.
    In: Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), New York, NY. 2014, 871–880
    [preprint (pdf)] [slides (pdf)] [online demo (static)] [DOI:10.1145/2623330.2623740] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Generalized Outlier Detection with Flexible Kernel Density Estimates.
    In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA. 2014, 542–550
    [preprint (pdf)] [code] [DOI:10.1137/1.9781611973440.63] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Local Outlier Detection Reconsidered: a Generalized View on Locality with Applications to Spatial, Video, and Network Outlier Detection.
    In: Data Mining and Knowledge Discovery 28 (1). 2014, 190–237
    [authorized access (Springer)] [code] [DOI:10.1007/s10618-012-0300-z] [bibtex]

2013

  • Elke Achtert, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Interactive Data Mining with 3D-Parallel-Coordinate-Trees.
    In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York City, NY. 2013, 1009–1012
    [ELKI] [DOI:10.1145/2463676.2463696] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Geodetic Distance Queries on R-Trees for Indexing Geographic Data.
    In: Proceedings of the 13th International Symposium on Spatial and Temporal Databases (SSTD), Munich, Germany. 2013, 146–164
    [code] [DOI:10.1007/978-3-642-40235-7_9] [bibtex]
  • Erich Schubert.
    Generalized and Efficient Outlier Detection for Spatial, Temporal, and High-Dimensional Data Mining.
    PhD thesis, Ludwig-Maximilians-Universität München, Munich, Germany. 2013
    [Universitätsbibliothek] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    Outlier Detection in High-Dimensional Data.
    Tutorial at the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Gold Coast, Australia. 2013
    [slides (pdf)]

2012

  • Elke Achtert, Sascha Goldhofer, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Evaluation of Clusterings – Metrics and Visual Support.
    In: Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC. 2012, 1285–1288
    [ELKI] [DOI:10.1109/ICDE.2012.128] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Outlier Detection in Arbitrarily Oriented Subspaces.
    In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), Brussels, Belgium. 2012, 379–388
    [code] [DOI:10.1109/ICDM.2012.21] [bibtex]
  • Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel.
    On Evaluation of Outlier Rankings and Outlier Scores.
    In: Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA. 2012, 1047–1058
    [code] [DOI:10.1137/1.9781611972825.90] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    A Survey on Unsupervised Outlier Detection in High-Dimensional Numerical Data.
    In: Statistical Analysis and Data Mining 5 (5). 2012, 363–387
    [more information] [DOI:10.1002/sam.11161] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    Outlier Detection in High-Dimensional Data.
    Tutorial at the 12th International Conference on Data Mining (ICDM), Brussels, Belgium. 2012
    [slides (pdf)] [DOI:10.1109/ICDM.2012.9]

2011

  • Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Evaluation of Multiple Clustering Solutions.
    In: 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece. 2011, 55–66
    [pdf] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Interpreting and Unifying Outlier Scores.
    In: Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ. 2011, 13–24
    [preprint (pdf)] [code] [DOI:10.1137/1.9781611972818.2] [bibtex]
  • Elke Achtert, Ahmed Hettab, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Spatial Outlier Detection: Data, Algorithms, Visualizations.
    In: Proceedings of the 12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN. 2011, 512–516, Best Demonstration Paper Award
    [ELKI] [DOI:10.1007/978-3-642-22922-0_41] [bibtex]
  • Thomas Bernecker, Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Quality of Similarity Rankings in Time Series.
    In: Proceedings of the 12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN. 2011, 422–440
    [DOI:10.1007/978-3-642-22922-0_25] [bibtex]

2010

  • Elke Achtert, Hans-Peter Kriegel, Lisa Reichert, Erich Schubert, Remigius Wojdanowski, and Arthur Zimek.
    Visual Evaluation of Outlier Detection Models.
    In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications (DASFAA), Tsukuba, Japan. 2010, 396–399
    [ELKI] [poster] [DOI:10.1007/978-3-642-12098-5_34] [bibtex]
  • Thomas Bernecker, Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Subspace Similarity Search Using the Ideas of Ranking and Top-k Retrieval.
    In: Proceedings of the 26th International Conference on Data Engineering (ICDE) Workshop on Ranking in Databases (DBRank), Long Beach, CA. 2010, 4–9
    [more information] [DOI:10.1109/ICDEW.2010.5452771] [bibtex]
  • Thomas Bernecker, Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces.
    In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany. 2010, 555–564
    [pdf] [more information] [DOI:10.1007/978-3-642-13818-8_38] [bibtex]
  • Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?.
    In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany. 2010, 482–500
    [pdf] [supplementary material] [DOI:10.1007/978-3-642-13818-8_34] [bibtex]
  • Ines Färber, Stephan Günnemann, Hans-Peter Kriegel, Peer Kröger, Emmanuel Müller, Erich Schubert, Thomas Seidl, and Arthur Zimek.
    On Using Class-Labels in Evaluation of Clusterings.
    In: MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC. 2010
    [pdf]

2009

  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    LoOP: Local Outlier Probabilities.
    In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China. 2009, 1649–1652
    [pdf] [code] [DOI:10.1145/1645953.1646195] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data.
    In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Bangkok, Thailand. 2009, 831–838
    [pdf] [slides] [code] [DOI:10.1007/978-3-642-01307-2_86] [bibtex]
  • Elke Achtert, Thomas Bernecker, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.
    In: Proceedings of the 11th International Symposium on Spatial and Temporal Databases (SSTD), Aalborg, Denmark. 2009, 436–440
    [ELKI] [pdf] [poster] [DOI:10.1007/978-3-642-02982-0_35] [bibtex]

2008

  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms.
    In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China. 2008, 418–435
    [pdf] [code] [DOI:10.1007/978-3-540-69497-7_27] [bibtex]
  • Erich Schubert.
    Statistical Approaches for Robustifying Correlation Clustering Algorithms.
    Diploma thesis, Ludwig-Maximilians-Universität München, Munich, Germany. 2008

2005

  • Erich Schubert, Sebastian Schaffert, and François Bry.
    Structure-Preserving Difference Search for XML Documents.
    In: Proceedings of the Extreme Markup Languages 2005 Conference, Montreal, Quebec, Canada. 2005
    [proceedings] [code] [bibtex]
  • Patrick F. Riley, and Erich Schubert.
    mReplay: Mobile Sports Replay and Fan Democracy.
    In: Axmedis 2005: Proceedings of the 1st International conference on Automated production of Cross Media content for Multi-channel distribution. 2005
    [DOI:10.1400/41109]
  • Erich Schubert.
    Structure Preserving Difference Search in Semistructured Data.
    Project thesis (undergraduate), Ludwig-Maximilians-Universität München, Munich, Germany. 2005