|
COP 5577 Principles of Data Mining [Fall 2007]
|
Announcements
The homepage is always under construction. Check the course description and syllabus below to decide if
this course suits you.
[9:24pm
on Dec. 5th]: Please submit your final project by December 14th (Friday).
[9:24pm
on Nov 13]: Quiz 7 Statistics: High/Low/Median=6/0/3.
[9:24pm
on Oct 23]: Quiz 6 Statistics: High/Low/Median=10/0/6.
[2:10pm
on Oct 23]: HW3 is posted on line and is due on Nov. 13th, 2007.
[9:25pm
on Oct 16]: List of Final Projects are posted on line.
[9:24pm
on Oct 16]: Quiz 5 Statistics: High/Low/Median=10/0/2.
[9:24pm
on Oct 9]: Quiz 4 Statistics: High/Low/Median=10/1.5/5.5.
[9:24pm
on Oct 2]: Quiz 3 Statistics: High/Low/Median=9/0/2.
[9:05pm
on Oct 2]: Deadline for HW2 has been extended to Oct. 16th, 2007.
[5:24pm
on Oct 2]: HW1 Statistics: High/Low/Median=18.5/0/11.5.
Instructor
Dr. Tao Li, Assistant Professor
School of Computer Science and Engineering
Florida International University
Office: ECS 318
Email: taoli AT cs.fiu.edu
Office Hours: Tuesday 2:30pm-4:30pm or by appointment
Meeting Time and Location
Tuesday 6:25pm-9:05pm, ECS 235
Course Materials
Course Description
Data Mining is the nontrivial extraction of implicit, previously unknown,
and potentially useful information from data. It has gradually matured as a
discipline merging ideas from statistics, machine learning, database and etc.
This course is designed to give a graduate-level student an introductory
survey to the methodologies, technologies, mathematics and algorithms currently
needed by people who do research in data mining or who may need to apply data
mining techniques to practical applications.
Emphasis will be laid on both algorithmic and application issues.
Course Syllabus (Subject to revision)
- Mathematical Background for Data Mining
- Probability Theory
- Information Theory
- Basic Linear Algebra
- Expectation and Maximization
- Association Mining
- Frequent Set Mining
- Sequence Mining
- Classification
- Decision Tree Learning
- Nearest Neighbor
- Support Vector Machines
- Bayesian Networks, Maximum Likelihood, Maximum Entropy
- Feature Selection and Dimension Reduction
- Clustering
- Traditional approaches (e.g., K-means, Hierarchical etc.)
- Spectral Clustering, Matrix Factorization
- Subspace Clustering
- Co-clustering
- Ensemble Methods
- Classifier Combination
- Cluster Combination
- Web Mining
- Web Content Analysis
- Discovering web communities
- Pagerank, Hub and Authorities
- Web Crawling
- Optional Topics
- Privacy issues in data mining
- Similarity search
- Basics of natural language processing
Prerequisites
Basically students need to know at least a programming language
(e.g., C/C++, Java or Matlab etc.). Students entering the class with basic knowledge
of probability, statistics and algorithms will be at an advantage, but the class will be
designed so that anyone with basic mathematical background can catch up and
fully participate.
Format and Grading
The course assignments include projects, written homeworks, paper discussions and presentations.
Research projects will be designed to improve the critical analysis and problem-solving skills of students.
Class attendance is mandatory. In addition, occasional quizzes will be given in class. Evaluation will be a subjective process, but it will be primarily based on the students' understanding of the course material. Final grades will be calculated as follows.
| Quizzes | 15% |
| Class Participation | 10% |
| Exams | 20% |
| Assignments | 55% |
Policies on Assignments and Exams
All project deliverables and assignments should be submitted before midnight on the due date. The only excuse for missing an exam is verifiable cases of illness and emergencies and religious holidays. Please check the dates for exams and inform me at the earliest of any conflict due to the above-mentioned reasons.
Misc Links
Textbooks and References
Textbook
- Pang-Ning Tan, Michael Steinbach and Vipin Kumar. Introduction to Data
Mining. Addison Wesley, 2005.
References
- Jiawei Han and Micheline Kamber.
Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006,
Second Edition.
- Tom Mitchell. Machine Learning.
McGraw Hill, 1997.
- Hastie, Tibshirani and Friedman.
The Elements of Statistical Learning.
Springer-Verlag, 2001.
- Chakrabarti.
Mining the Web: discovering knowledge from hypertext data.
Morgan Kaufmann , 2003. Available on line at
FIU Library .
A lot of reading material from top conferences/journals
will be made available online or in class as required. In addition,
lecture notes will be available on line.
Code of Academic Integrity:
University Policies:
For academic misconduct, sexual harassment, religious holydays, and information on services for students with disabilities, see :
| ©2007 Tao Li. All rights reserved. |
last Updated:
|
|
|