Dr. Erich Schubert


Phone: +49 (0) 6221 / 54 - 14353
Fax: +49 (0) 6221 / 54 - 14351
Office: INF 205, room 1/312 (first floor)
Email: schubert(at)informatik(dot)uni-heidelberg(dot)de
Office hours during semester: by appointment

Sorry: no international internships. Please understand that I will not even answer.
Only students enrolled at Heidelberg can apply for my Practicals and Thesis Topics.

News

2017-10-15: In winter term 2017/2018, I will be teaching the class Knowledge Discovery in Databases (IKDD).

2017-07-17:  Two papers accepted for the Int. Conf. on Similarity Search and Applications (SISAP) 2017, in Munich: E. Kirner, E. Schubert, and A. Zimek. Good and Bad Neighborhood Approximations for Outlier Detection Ensembles and E. Schubert, and M. Gertz. Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection – A Remedy Against the Curse of Dimensionality?

ACM Computing Reviews - Notable Article 2016

2017-07-06:  ACM Computing Reviews Notable Books and Articles 2016 includes our publication "G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent, and M. E. Houle: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study" (in Data Mining and Knowledge Discovery 2016).

2017-04-01:  I will be offering the lecture Advanced Topics in Text Mining (as 2+1 class) in summer term.
2017-03-13:  Accepted for publication at TODS: E. Schubert and J. Sander and M. Ester and H.-P. Kriegel and X. Xu. DBSCAN Revisited, Revisited: Why and how you should (still) use DBSCAN (to appear).
2017-01-26:  Accepted for publication at VLDB: G. Casanova and E. Englmeier and M. E. Houle and P. Kröger and M. Nett and E. Schubert and A. Zimek. Dimensional Testing for Reverse k-Nearest Neighbor Search.
2016-10-24:  The SISAP 2016 proceedings are online as: L. Amsaleg, M. E. Houle, E. Schubert: Similarity Search and Applications - 9th International Conference, SISAP 2016, Tokyo, Japan, October 24-26, 2016. Proceedings. Lecture Notes in Computer Science 9939
2016-10-08:  Accepted for publication at KAIS: H.-P. Kriegel, E. Schubert, A. Zimek: The (Black) Art of Runtime Evaluation: Are We Comparing Algorithms or Implementations?
2016-09-01:  I joined the database system research group @ Heidelberg.

About

I did my PhD in the database systems group at the Ludwig-Maximilians-Universität München before I joined the Database Systems Research group of Prof. Dr. Michael Gertz as a Post-Doc. My thesis was on generalizing outlier detection, and I did some research on change detection on large-scale textual data streams.

I am a lead author of the ELKI data mining toolkit.

Research Interests

  • Data Mining & Text Mining
  • Event Detection and Analysis
  • Clustering and Outlier Detection
  • Information Retrieval & Information Extraction
  • Network Analysis & Graph Algorithms
  • Machine Learning
See also: Google ScholarDBLPORCID  – ACM Digital LibrarySemantic ScholarAminerScopus

Publications

2017

  • Evelyn Kirner, Erich Schubert, and Arthur Zimek.
    Good and Bad Neighborhood Approximations for Outlier Detection Ensembles.
    In: Proceedings of the 10th International Conference on Similarity Search and Applications (SISAP), Munich, Germany. 2017, 173–187
    [slides (pdf)] [manuscript (pdf)] [code] [DOI:10.1007/978-3-319-68474-1_12] [bibtex]
  • Erich Schubert, and Michael Gertz.
    Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection - A Remedy Against the Curse of Dimensionality?.
    In: Proceedings of the 10th International Conference on Similarity Search and Applications (SISAP), Munich, Germany. 2017, 188–203
    [slides (pdf)] [manuscript (pdf)] [code] [DOI:10.1007/978-3-319-68474-1_13] [bibtex]
  • Erich Schubert, Andreas Spitz, Michael Weiler, Johanna Geiß, and Michael Gertz.
    Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding.
    In: CoRR abs/1708.03569. 2017
    [link] [bibtex]
  • Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    The (black) art of runtime evaluation: Are we comparing algorithms or implementations?.
    In: Knowledge and Information Systems (KAIS) 52 (2). 2017, 341–378, Online first 2016, paginated 2017
    [authorized access (Springer)] [DOI:10.1007/s10115-016-1004-2] [bibtex]
  • Guillaume Casanova, Elias Englmeier, Michael E. Houle, Peer Kröger, Michael Nett, Erich Schubert, and Arthur Zimek.
    Dimensional Testing for Reverse k-Nearest Neighbor Search.
    In: Proceedings of the VLDB Endowment 10 (7). 2017, 769–780
    [pdf] [DOI:10.14778/3067421.3067426] [bibtex]
  • Erich Schubert, Jörg Sander, Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu.
    DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN.
    In: ACM Transactions on Database Systems (TODS) 42 (3). 2017, 19:1–19:21
    [DOI:10.1145/3068335] [bibtex]
  • Arthur Zimek, and Erich Schubert.
    Outlier Detection.
    In: Ling Liu, and M. Tamer Özsu (eds.), Encyclopedia of Database Systems. 2017, 5, online first, to appear 2018
    [DOI:10.1007/978-1-4899-7993-3_80719-1]

2016

  • Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle.
    On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen. 2016
    [abstract (pdf)] [slides (pdf)] [poster (pdf)] [data and results]
  • Laurent Amsaleg, Michael E. Houle, and Erich Schubert (eds.).
    Similarity Search and Applications - 9th International Conference, SISAP 2016, Tokyo, Japan, October 24-26, 2016. Proceedings.
    Lecture Notes in Computer Science 9939. 2016
    [conference homepage] [DOI:10.1007/978-3-319-46759-7] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    SPOTHOT: Scalable Detection of Geo-spatial Events in Large Textual Streams.
    In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management (SSDBM), Budapest, Hungary. 2016, 8:1–8:12
    [preprint (pdf)] [DOI:10.1145/2949689.2949699] [bibtex]
  • Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle.
    On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study.
    In: Data Mining and Knowledge Discovery 30 (4). 2016, 891–927, Awarded “ACM Computing Reviews Notable Books and Articles 2016”
    [authorized access (Springer)] [data and results] [DOI:10.1007/s10618-015-0444-8] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    Scalable Detection of Emerging Topics and Geo-spatial Events in Large Textual Streams.
    In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen. 2016
    [abstract (pdf)] [slides (pdf)] [poster (pdf)]

2015

  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles.
    In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA), Hanoi, Vietnam. 2015, 19–36
    [preprint (pdf)] [slides (pdf)] [code] [DOI:10.1007/978-3-319-18123-3_2] [bibtex]
  • Erich Schubert, Michael Weiler, and Arthur Zimek.
    Outlier Detection and Trend Detection: Two Sides of the Same Coin.
    In: 1st International Workshop on Event Analytics using Social Media Data at the 15th IEEE International Conference on Data Mining (ICDM), Atlantic City, NJ. 2015, 40–46
    [preprint (pdf)] [DOI:10.1109/ICDMW.2015.79] [bibtex]
  • Erich Schubert, Alexander Koos, Tobias Emrich, Andreas Züfle, Klaus Arthur Schmid, and Arthur Zimek.
    A Framework for Clustering Uncertain Data.
    In: Proceedings of the VLDB Endowment 8 (12). 2015, 1976–1979
    [ELKI] [pdf] [DOI:10.14778/2824032.2824115] [bibtex]
  • Erich Schubert, and OpenStreetMap Contributors.
    Fast Reverse Geocoder using OpenStreetMap data.
    Open Data LMU. 2015
    [code] [data]

2014

  • Xuan Hong Dang, Ira Assent, Raymond T. Ng, Arthur Zimek, and Erich Schubert.
    Discriminative Features for Identifying and Interpreting Outliers.
    In: Proceedings of the 30th International Conference on Data Engineering (ICDE), Chicago, IL. 2014, 88–99
    [preprint (pdf)] [DOI:10.1109/ICDE.2014.6816642] [bibtex]
  • Erich Schubert, Michael Weiler, and Hans-Peter Kriegel.
    SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds.
    In: Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), New York, NY. 2014, 871–880, Included in Wang, Wei. “Data Science for Social Good - 2014 KDD Highlights.'' AAAI. 2015.
    [preprint (pdf)] [slides (pdf)] [online demo (static)] [DOI:10.1145/2623330.2623740] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Generalized Outlier Detection with Flexible Kernel Density Estimates.
    In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA. 2014, 542–550
    [preprint (pdf)] [code] [DOI:10.1137/1.9781611973440.63] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Local Outlier Detection Reconsidered: a Generalized View on Locality with Applications to Spatial, Video, and Network Outlier Detection.
    In: Data Mining and Knowledge Discovery 28 (1). 2014, 190–237, Online 2012, paginated 2014
    [authorized access (Springer)] [code] [DOI:10.1007/s10618-012-0300-z] [bibtex]

2013

  • Elke Achtert, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Interactive Data Mining with 3D-Parallel-Coordinate-Trees.
    In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York City, NY. 2013, 1009–1012
    [ELKI] [DOI:10.1145/2463676.2463696] [bibtex]
  • Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel.
    Geodetic Distance Queries on R-Trees for Indexing Geographic Data.
    In: Proceedings of the 13th International Symposium on Spatial and Temporal Databases (SSTD), Munich, Germany. 2013, 146–164
    [code] [DOI:10.1007/978-3-642-40235-7_9] [bibtex]
  • Erich Schubert.
    Generalized and Efficient Outlier Detection for Spatial, Temporal, and High-Dimensional Data Mining.
    PhD thesis, Ludwig-Maximilians-Universität München, Munich, Germany. 2013
    [Universitätsbibliothek] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    Outlier Detection in High-Dimensional Data.
    Tutorial at the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Gold Coast, Australia. 2013
    [slides (pdf)]

2012

  • Elke Achtert, Sascha Goldhofer, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Evaluation of Clusterings – Metrics and Visual Support.
    In: Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC. 2012, 1285–1288
    [ELKI] [DOI:10.1109/ICDE.2012.128] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Outlier Detection in Arbitrarily Oriented Subspaces.
    In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), Brussels, Belgium. 2012, 379–388
    [code] [DOI:10.1109/ICDM.2012.21] [bibtex]
  • Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel.
    On Evaluation of Outlier Rankings and Outlier Scores.
    In: Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA. 2012, 1047–1058
    [code] [DOI:10.1137/1.9781611972825.90] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    A Survey on Unsupervised Outlier Detection in High-Dimensional Numerical Data.
    In: Statistical Analysis and Data Mining 5 (5). 2012, 363–387, Included in the “most accessed papers from Statistical Analysis and Data Mining” 2014–2016
    [more information] [DOI:10.1002/sam.11161] [bibtex]
  • Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel.
    Outlier Detection in High-Dimensional Data.
    Tutorial at the 12th International Conference on Data Mining (ICDM), Brussels, Belgium. 2012
    [slides (pdf)] [DOI:10.1109/ICDM.2012.9]

2011

  • Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Evaluation of Multiple Clustering Solutions.
    In: 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece. 2011, 55–66
    [pdf] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Interpreting and Unifying Outlier Scores.
    In: Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ. 2011, 13–24
    [preprint (pdf)] [code] [DOI:10.1137/1.9781611972818.2] [bibtex]
  • Elke Achtert, Ahmed Hettab, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    Spatial Outlier Detection: Data, Algorithms, Visualizations.
    In: Proceedings of the 12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN. 2011, 512–516, Best Demonstration Paper Award
    [ELKI] [DOI:10.1007/978-3-642-22922-0_41] [bibtex]
  • Thomas Bernecker, Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Quality of Similarity Rankings in Time Series.
    In: Proceedings of the 12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN. 2011, 422–440
    [DOI:10.1007/978-3-642-22922-0_25] [bibtex]

2010

  • Elke Achtert, Hans-Peter Kriegel, Lisa Reichert, Erich Schubert, Remigius Wojdanowski, and Arthur Zimek.
    Visual Evaluation of Outlier Detection Models.
    In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications (DASFAA), Tsukuba, Japan. 2010, 396–399
    [ELKI] [poster] [DOI:10.1007/978-3-642-12098-5_34] [bibtex]
  • Thomas Bernecker, Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Subspace Similarity Search Using the Ideas of Ranking and Top-k Retrieval.
    In: Proceedings of the 26th International Conference on Data Engineering (ICDE) Workshop on Ranking in Databases (DBRank), Long Beach, CA. 2010, 4–9
    [more information] [DOI:10.1109/ICDEW.2010.5452771] [bibtex]
  • Thomas Bernecker, Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, and Arthur Zimek.
    Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces.
    In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany. 2010, 555–564
    [pdf] [more information] [DOI:10.1007/978-3-642-13818-8_38] [bibtex]
  • Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?.
    In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany. 2010, 482–500
    [pdf] [supplementary material] [DOI:10.1007/978-3-642-13818-8_34] [bibtex]
  • Ines Färber, Stephan Günnemann, Hans-Peter Kriegel, Peer Kröger, Emmanuel Müller, Erich Schubert, Thomas Seidl, and Arthur Zimek.
    On Using Class-Labels in Evaluation of Clusterings.
    In: MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC. 2010
    [pdf]

2009

  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    LoOP: Local Outlier Probabilities.
    In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China. 2009, 1649–1652
    [pdf] [code] [DOI:10.1145/1645953.1646195] [bibtex]
  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data.
    In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Bangkok, Thailand. 2009, 831–838
    [pdf] [slides] [code] [DOI:10.1007/978-3-642-01307-2_86] [bibtex]
  • Elke Achtert, Thomas Bernecker, Hans-Peter Kriegel, Erich Schubert, and Arthur Zimek.
    ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.
    In: Proceedings of the 11th International Symposium on Spatial and Temporal Databases (SSTD), Aalborg, Denmark. 2009, 436–440
    [ELKI] [pdf] [poster] [DOI:10.1007/978-3-642-02982-0_35] [bibtex]

2008

  • Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek.
    A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms.
    In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China. 2008, 418–435
    [pdf] [code] [DOI:10.1007/978-3-540-69497-7_27] [bibtex]
  • Erich Schubert.
    Statistical Approaches for Robustifying Correlation Clustering Algorithms.
    Diploma thesis, Ludwig-Maximilians-Universität München, Munich, Germany. 2008

2005

  • Erich Schubert, Sebastian Schaffert, and François Bry.
    Structure-Preserving Difference Search for XML Documents.
    In: Proceedings of the Extreme Markup Languages 2005 Conference, Montreal, Quebec, Canada. 2005
    [proceedings] [code] [bibtex]
  • Patrick F. Riley, and Erich Schubert.
    mReplay: Mobile Sports Replay and Fan Democracy.
    In: Axmedis 2005: Proceedings of the 1st International conference on Automated production of Cross Media content for Multi-channel distribution. 2005
    [DOI:10.1400/41109]
  • Erich Schubert.
    Structure Preserving Difference Search in Semistructured Data.
    Project thesis (undergraduate), Ludwig-Maximilians-Universität München, Munich, Germany. 2005