CAP 6778 Advanced Data Mining [Spring 2007]

Announcements

The homepage is always under construction. Check the course description and syllabus below to decide if this course suits you.

The paper presentation on April 12th has changed.

Instructor

Dr. Tao Li, Assistant Professor
School of Computer Science and Engineering
Florida International University

Office: ECS 318
Email: taoli AT cs.fiu.edu
Office Hours: Tuesday and Thursday 5:00pm-6:00pm or by appointment

Meeting Time and Location

Thursday 12:30pm-3:15pm, ECS 136

Course Description

Data Mining is one of the hottest fields in Computer Science. Data has been accumulating throughout the computer age in many forms, including database systems, spreadsheets, text files, and recently web pages. Data mining aims to search through data for hidden relationships and patterns in your data. This is a special topic course on data mining.  We will cover advanced topics such as web data mining, stream data mining, relational data mining, tree/graph mining, spatiotemporal data indexing and mining,  privacy-preserving data mining, high-dimensional data clustering, basics of natural language processing, social network and linkage analysis,

This course will be highly beneficial to students whose research interests are in database, data mining, bioinformatics, information retrieval, decision science and artificial intelligence, and also to those who may need to apply data mining to any application.

Course Syllabus (Subject to revision)

This is a seminar course that will focus on recent developments of advanced data mining techniques and their applications to various problems. After the introductory lectures, subsequent classes will mainly based on research papers.   Topics will cover:

  • Overview of Basic Data Mining Techniques
  • Mining Data Streams
  • Relational Data Mining
  • Tree/Graph Mining
  • Spatiotemporal Data Indexing and Mining
  • Privacy-preserving Data Mining
  • Similarity Search
  • High-Dimensional Data Clustering
  • Social Network and Linkage Analysis
  • Basics of Natural Language Processing

Prerequisites

 COP5992 Principles of Data Mining or Consent of Instructor

Format and Grading

  • A final grade will be based on the student's presentation and participation (40%),  assignments and the project (60%). Students who demonstrate excellent research performance by developing the project to publication will get extra scores in the grade.
  • Everyone needs to present at least one paper (with high-quality PPT/PDF slides). The presenter will also be responsible for leading group discussions and answering questions. Also, everyone needs to bring one-page summary/comments of the papers to be presented in class and hand it in right after the class presentations.
  • Everyone will conduct a research project during the course. The project can be a comparative study on existing data mining algorithms for a specific application, a development of new data mining algorithms which to some extent improve the existing methods, a novel application of existing methods to practical problems.  The project can be done individually or in group of two.
  • You are strongly encouraged to select the papers in excellent quality and published or appeared in 2005, 2006 or 2007. Please discuss with me before you finalize your paper selection.
  • Recommended conference proceedings: SIGKDD, SIGIR, ICML, SIGMOD, ICDM, SDM etc. Recommended journals: DMKD (Data Mining and Knowledge Discovery), SIGKDD Explorations, Machine Learning, Journal of Machine Learning Research, Knowledge and Information Systems (KAIS), IEEE TKDE, etc. Use Google Scholar, citeseer or other Web services to find the papers you want to select.

Textbooks and References

The course materials will mainly consist of presentation and discussion of research papers and research project reports closely related to the topics in data mining. A lot of reading material from top conferences/journals will be made available online or in class as required. In addition, lecture notes will be available on line.

The following textbooks are highly recommended: (You should have at least one of those books)

  • Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, 2005.
  • Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006.
  • Pang-Ning Tan, Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Addison Wesley, 2005.

List of References:

  • Tom Mitchell. Machine Learning. McGraw Hill, 1997.
  • R. O. Duda et al., Pattern Classification. Wiley Interscience
  • Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
  • Chakrabarti. Mining the Web: discovering knowledge from hypertext data. Morgan Kaufmann , 2003. Available on line at FIU Library.

Course Materials

Related Links

Code of Academic Integrity:

University Policies:

For academic misconduct, sexual harassment, religious holydays, and information on services for students with disabilities, see :
©2007 Tao Li. All rights reserved. last Updated: