Advanced Topics of Text Mining (IATM)

Lecture slides for class review (print version)

The lecture introduces the fundamentals as well as selected advanced topics from the domain of text mining.
  • fundamentals of data modeling and preprocessing, in particular for textual data

  • statistical and algorithmic foundations of the analysis methods

  • basics of computer linguistics and natural language processing for processing textual data (e.g. morphological analysis, part-of-speech tagging)

  • selected and current focus topics such as classification, cluster analysis, sequential pattern mining, association rule mining, topic modeling with emphasis on the application to textual data

Summer Term 2017 Topics

The focus for this term will be on algorithms for text clustering and topic modeling.

Note: We will only touch deep learning, machine translation, and NLP as side topics this year. To discuss "advanced topics" in the given amount of time we need to specialize. Deep approaches are a candidate for focus in the next iteration of this class.

While the lecture will briefly introduce the fundamentals of text modeling for the algorithms, students that intend to attend the class are encouraged to refresh their knowledge of:

  • Linear algebra, in particular vector spaces and matrix operations
  • Data structures for retrieval, in particular search trees
  • Data mining fundamentals such as k-means clustering and optimization techniques such as EM

Location:

The lecture is over.

Recommended prerequisites:

The following prerequisites are strongly recommended, but not formally required:
  • Algorithmen und Datenstrukturen (IAD)
  • Knowledge Discovery in Databases (IKDD)

Literature:

General introductory textbooks (preliminary list):

Note that these textbooks only briefly touch the advanced topics that we want to discuss in this lecture.