|
COP 5992 Principles of Data Mining [Fall 2004]
|
Announcements
The homepage is always under construction. Check the course description and syllabus below to decide if
this course suits you.
[Nov 30 2004 2:41pm] Quiz 16 Statistics high/low/median=7/0/4.5.
[Nov 29 2004 12:25pm] Final project submission guidelines are posted online.
[Nov 16 2004 2:41pm] Quiz 15 Statistics high/low/median=10/8/10.
[Nov 3 2004 2:41pm] Quiz 14 Statistics high/low/median=10/6.5/9.
[Nov 1 2004 3:08pm] Project 3 Statistics high/low/median=15/12.5/14.
[Oct,31 2004 3:02pm] Midterm II Statistics high/low/median=23/14.5/17.
[Oct,25 2004 3:03pm] Project 4 is now due 1/11/2004.
[Oct,25 2004 3:02pm] Midterm Statistics high/low/median=54/39/45.
[Oct,18 2004 2:02pm] Quiz 11 Statistics high/low/median=10/0/5.
[Oct,13 2004 2:02pm] Answers to Problem 2 of Quiz 11. Quiz 11 Statistics high/low/median=7/2.5/4.5.
[Oct,13 2004 2:02pm] Project 4 is now due 27/10/2004.
[Oct,11 2004 2:02pm] Quiz 10 Statistics high/low/median=10/5/8.
[Oct,11 2004 10:02am] Final project ideas and Project 4 are posted on line.
[Oct,10 2004 10:02pm] Links to Students' webpages are Added.
[Oct,10 2004 2:02pm] Middle Term Exam is scheduled on Oct. 25 2004.
[Oct,6 2004 2:02pm] Quiz 8 Statistics high/low/median=10/6/7.
[Oct,6 2004 9:52am] HW 2 Statistics high/low/median=13.5/13/10. The solution is posted online.
[September 29, 2004 3:52pm] Quiz 7 Statistics high/low/median=9.5/5.5/7.
[September 27, 2004 2:52pm] Quiz 6 Statistics high/low/median=9/0/4.5.
[September 27, 2004 2:40pm] Lecture Notes and Book Chapters are posted on line.
[September 22, 2004 2:12pm] Quiz 6 Statistics high/low/median=10/3/7.5.
[September 22, 2004 11:40am] Lecture Notes, Reference and Project 3 are posted on line.
[September 20, 2004 2:12pm] HW1 Statistics high/low/median=10/6/10.
[September 20, 2004 2:12pm] Quiz 5 Statistics high/low/median=8/4.5/6.
[September 20, 2004 2:22pm] Lecture Notes 6 and references are posted on line.
[September 15, 2004 2:22pm] Lecture Notes and References are posted on line.
[September 15, 2004 2:12pm] Quiz 4 Statistics high/low/median=10/8/9.
[September 14, 2004 10:12am] Solution for Written Assignment 1 is on line.
[September 13, 2004 2:50pm] Quiz 3 statistics: high/low/median=10/6.5/8.
[September 13, 2004 2:50pm] Lecture notes and Assignment 2 are posted on line.
[September 8, 2004 3:30pm] Quiz 2 statistics: high/low/median=9/5/6.5.
[September 8, 2004 3:20pm] Lecuter notes for class 3 is now on line.
[September 7, 2004 10:15am] Deadline for written assignment 1 is extended to 13/9/2004.
[September 1, 2004 2:00pm] Homework 1 and Project 1 are posted.
[August 30, 2004 4:00pm] Email me to get the username and password for accessing the lecture notes and assignments.
Instructor
Dr. Tao Li, Assistant Professor
School of Computer Science and Engineering
Florida International University
Office: ECS 318
Email: taoli AT cs.fiu.edu
Office Hours: Monday and Wednesday 2:30pm-3:30pm or by appointment
Students
Rafael Alpizar, Kasturi Chatterjee, Paresh Gupta, Selim Kalayci, Tom Milledge, Wei Peng, Sonal Sood, Ramakrishna Varadarajan, Chengyong Yang, Erliang Zeng.
Meeting Time and Location
Monday and Wednesday 12:30pm-1:45pm, ECS 143
Course Materials
Course Description
Data Mining is the nontrivial extraction of implicit, previously unknown,
and potentially useful information from data. It has gradually matured as a
discipline merging ideas from statistics, machine learning, database and etc.
This course is designed to give a graduate-level student an introductory
survey to the methodologies, technologies, mathematics and algorithms currently
needed by people who do research in data mining or who may need to apply data
mining techniques to practical applications.
Emphasis will be laid on both algorithmic and application issues.
Course Syllabus (Subject to revision)
- Mathematical Background for Data Mining
- Probability Theory
- Information Theory
- Basic Linear Algebra
- Expectation and Maximization
- Association Mining
- Frequent Set Mining
- Sequence Mining
- Classification
- Decision Tree Learning
- Nearest Neighbor
- VC-dimension, Support Vector Machines
- Bayesian Networks, Maximum Likelihood, Maximum Entropy
- Feature Selection and Dimension Reduction
- Clustering
- Traditional approaches (e.g., K-means, Hierarchical etc.)
- Spectral Clustering, Matrix Factorization
- Subspace Clustering
- Co-clustering
- Semi-supervised Learning
- Role of Unlabeled Data
- Co-training, Minimize Disagreement
- Ensemble Methods
- Classifier Combination
- Cluster Combination
- Web Mining
- Web Content Analysis
- Discovering web communities
- Pagerank, Hub and Authorities
- Web Crawling
- Optional Topics
- Privacy issues in data mining
- Similarity search
- Basics of natural language processing
Prerequisites
Basically students need to know at least a programming language
(e.g., C/C++, Java or Matlab etc.). Students entering the class with basic knowledge
of probability, statistics and algorithms will be at an advantage, but the class will be
designed so that anyone with basic mathematical background can catch up and
fully participate.
Format and Grading
The course assignments include projects, written homeworks, paper discussions and presentations.
Research projects will be designed to improve the critical analysis and problem-solving skills of students.
Class attendance is mandatory. In addition, occasional quizzes will be given in class. Evaluation will be a subjective process, but it will be primarily based on the students' understanding of the course material. Final grades will be calculated as follows.
| Quizzes | 15% |
| Class Participation | 10% |
| Exams | 20% |
| Assignments | 55% |
Policies on Assignments and Exams
All project deliverables and assignments should be submitted before midnight on the due date. The only excuse for missing an exam is verifiable cases of illness and emergencies and religious holidays. Please check the dates for exams and inform me at the earliest of any conflict due to the above-mentioned reasons.
Misc Links
Textbooks and References
Textbook
- Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, 1999, ISBN: 1558605525
References
- Jiawei Han and Micheline Kamber.
Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001, ISBN: 1558604898.
- Tom Mitchell. Machine Learning.
McGraw Hill, 1997.
- Hastie, Tibshirani and Friedman.
The Elements of Statistical Learning.
Springer-Verlag, 2001.
- Chakrabarti.
Mining the Web: discovering knowledge from hypertext data.
Morgan Kaufmann , 2003. Available on line at
FIU Library .
A lot of reading material from top conferences/journals
will be made available online or in class as required. In addition,
lecture notes will be available on line.
Code of Academic Integrity:
University Policies:
For academic misconduct, sexual harassment, religious holydays, and information on services for students with disabilities, see :
| ©2004 Tao Li. All rights reserved. |
last Updated:
|
|
|
|