COP 5577 Principles of Data Mining [Fall 2007]

Announcements

The homepage is always under construction. Check the course description and syllabus below to decide if this course suits you.

[9:24pm on Dec. 5th]: Please submit your final project by December 14th (Friday).

[9:24pm on Nov 13]: Quiz 7 Statistics: High/Low/Median=6/0/3.

[9:24pm on Oct 23]: Quiz 6 Statistics: High/Low/Median=10/0/6.

[2:10pm on Oct 23]: HW3 is posted on line and is due on Nov. 13th, 2007.

[9:25pm on Oct 16]: List of Final Projects are posted on line.

[9:24pm on Oct 16]: Quiz 5 Statistics: High/Low/Median=10/0/2.

[9:24pm on Oct 9]: Quiz 4 Statistics: High/Low/Median=10/1.5/5.5.

[9:24pm on Oct 2]: Quiz 3 Statistics: High/Low/Median=9/0/2.

[9:05pm on Oct 2]: Deadline for HW2 has been extended to Oct. 16th, 2007.

[5:24pm on Oct 2]: HW1 Statistics: High/Low/Median=18.5/0/11.5.

Instructor

Dr. Tao Li, Assistant Professor
School of Computer Science and Engineering
Florida International University

Office: ECS 318
Email: taoli AT cs.fiu.edu
Office Hours: Tuesday 2:30pm-4:30pm or by appointment

Meeting Time and Location

Tuesday 6:25pm-9:05pm, ECS 235

Course Materials

Course Description

Data Mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It has gradually matured as a discipline merging ideas from statistics, machine learning, database and etc. This course is designed to give a graduate-level student an introductory survey to the methodologies, technologies, mathematics and algorithms currently needed by people who do research in data mining or who may need to apply data mining techniques to practical applications. Emphasis will be laid on both algorithmic and application issues.

Course Syllabus (Subject to revision)

  • Mathematical Background for Data Mining
    • Probability Theory
    • Information Theory
    • Basic Linear Algebra
    • Expectation and Maximization
  • Association Mining
    • Frequent Set Mining
    • Sequence Mining
  • Classification
    • Decision Tree Learning
    • Nearest Neighbor
    • Support Vector Machines
    • Bayesian Networks, Maximum Likelihood, Maximum Entropy
    • Feature Selection and Dimension Reduction
  • Clustering
    • Traditional approaches (e.g., K-means, Hierarchical etc.)
    • Spectral Clustering, Matrix Factorization
    • Subspace Clustering
    • Co-clustering
  • Ensemble Methods
    • Classifier Combination
    • Cluster Combination
  • Web Mining
    • Web Content Analysis
    • Discovering web communities
    • Pagerank, Hub and Authorities
    • Web Crawling
  • Optional Topics
    • Privacy issues in data mining
    • Similarity search
    • Basics of natural language processing

Prerequisites

Basically students need to know at least a programming language (e.g., C/C++, Java or Matlab etc.). Students entering the class with basic knowledge of probability, statistics and algorithms will be at an advantage, but the class will be designed so that anyone with basic mathematical background can catch up and fully participate.

Format and Grading

The course assignments include projects, written homeworks, paper discussions and presentations. Research projects will be designed to improve the critical analysis and problem-solving skills of students. Class attendance is mandatory. In addition, occasional quizzes will be given in class. Evaluation will be a subjective process, but it will be primarily based on the students' understanding of the course material. Final grades will be calculated as follows.

Quizzes15%
Class Participation 10%
Exams 20%
Assignments55%

Policies on Assignments and Exams

All project deliverables and assignments should be submitted before midnight on the due date. The only excuse for missing an exam is verifiable cases of illness and emergencies and religious holidays. Please check the dates for exams and inform me at the earliest of any conflict due to the above-mentioned reasons.

Misc Links

Textbooks and References

Textbook

  • Pang-Ning Tan, Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Addison Wesley, 2005.

References

  • Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006, Second Edition.
  • Tom Mitchell. Machine Learning. McGraw Hill, 1997.
  • Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
  • Chakrabarti. Mining the Web: discovering knowledge from hypertext data. Morgan Kaufmann , 2003. Available on line at FIU Library .
A lot of reading material from top conferences/journals will be made available online or in class as required. In addition, lecture notes will be available on line.

Code of Academic Integrity:

University Policies:

For academic misconduct, sexual harassment, religious holydays, and information on services for students with disabilities, see :
©2007 Tao Li. All rights reserved. last Updated: