Advanced Topics of Text Mining (IATM)
The lecture introduces the fundamentals as well as selected advanced topics from the domain of text mining.
fundamentals of data modeling and preprocessing, in particular for textual data
statistical and algorithmic foundations of the analysis methods
basics of computer linguistics and natural language processing for processing textual data (e.g. morphological analysis, part-of-speech tagging)
selected and current focus topics such as classification, cluster analysis, sequential pattern mining, association rule mining, topic modeling with emphasis on the application to textual data
The focus for this term will be on algorithms for text clustering and topic modeling.
Note: We will only touch deep learning, machine translation, and NLP as side topics this year. To discuss "advanced topics" in the given amount of time we need to specialize. Deep approaches are a candidate for focus in the next iteration of this class.
While the lecture will briefly introduce the fundamentals of text modeling for the algorithms, students that intend to attend the class are encouraged to refresh their knowledge of:
- Linear algebra, in particular vector spaces and matrix operations
- Data structures for retrieval, in particular search trees
- Data mining fundamentals such as k-means clustering and optimization techniques such as EM
- Algorithmen und Datenstrukturen (IAD)
- Knowledge Discovery in Databases (IKDD)
General introductory textbooks (preliminary list):
- J. H. Friedman, R. Tibshirani, and T. Hastie: The Elements of Statistical Learning, 2001.
- C. D. Manning, P. Raghavan, and H. Schütze: Introduction to Information Retrieval, 2008.
- J. Leskovec, A. Rajaraman, and J. D. Ullman. Mining of Massive Datasets, 2014.
- B. Liu: Web Data Mining. Springer, 2011.
- C. C. Aggarwal, and CX. Zhai: Mining text data. Springer, 2012.
Note that these textbooks only briefly touch the advanced topics that we want to discuss in this lecture.