CAP 4770 Introduction to Data Mining [Fall 2007]

Announcements

The homepage is always under construction. Check the course description and syllabus below to decide if this course suits you.

[2:03pm on Dec. 10th]  More details on data preparation are posted online.

[2:03pm on Dec. 5th]  Please turn in your final project by Dec. 14th (Friday), 2007.

[4:03pm on Nov. 13th]  Final Project and HW 5 are posted on line.

[2:03pm on Oct. 23th]  Quiz 4 Statistics: High/Low/Median=8/0/8.

[2:01pm on Oct. 9th]  Quiz 3 Statistics: High/Low/Median=8.5/0/3.5.

[2:00pm on Oct. 9th]  The submission deadline for HW3 is postponed to Oct. 23, 2007.

HW1 Statistics:  High/Low/Median=24/13/20.5.

Starting from Sept 11th, 2007, we meet on Tuesday from 11:00am to 1:45pm at ECS 235 every week.

No class on Sept 18th, 2007. The students are expected to try out the following software:

Please send me via email your meeting preferences (by specifying the slots that you are definitely NOT available) at your earliest convenience. We will discuss the possible changes at Thursday's class on August 30.

Lecture slides for class 1 is posted online.

Instructor

Dr. Tao Li, Assistant Professor
School of Computer Science and Engineering
Florida International University

Office: ECS 318
Email: taoli AT cs.fiu.edu
Office Hours: Tuesday 2:30pm-4:30pm or by appointment

Meeting Time and Location

Tuesday 11:05am-1:45pm,  ECS 235

Course Description

Data Mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It has gradually matured as a discipline merging ideas from statistics, machine learning, database and etc. This is an introductory course for junior/senior computer science undergraduate students on the topic of Data Mining. Topics include data mining applications, data preparation, data reduction and various data mining techniques (such as association, clustering, classification, anomaly detection)

Textbooks and References

Textbook

  • Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2006, Second Edition.

References

  • Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers.
  • Chakrabarti. Mining the Web: discovering knowledge from hypertext data. Morgan Kaufmann , 2003. Available on line at FIU Library .

Prerequisites

COP 3530

Misc Links

Format and Grading

The course assignments include projects and written homeworks.  Projects will be designed to improve the critical analysis and problem-solving skills of students. Class attendance is mandatory. In addition, occasional quizzes will be given in class. Evaluation will be a subjective process, but it will be primarily based on the students' understanding of the course material. Final grades will be calculated as follows.

Quizzes15%
Class Participation 10%
Exams 20%
Assignments55%

Course Syllabus

  • Data Mining Introduction
  • Data Mining Applications
  • Data Preparation
  • Association Analysis
  • Mining Association Rules
  • Mining Sequential Patterns
  • Mining Temporal Data
  • Mining Spatial Data
  • Mining Graph Patterns
  • Infrequent Patterns Mining
  • Classification and Prediction
  • Clustering
  • Anomaly Detection

Tentative Course Schedule (Subject to revision)

No. Date Agenda HW Assigned HW Collected
1 August 28 (T) Course Organization, Introduction (Lecture Slides)    
2 August 30 (R) Ch1:  Data Mining Introduction (Lecture Slides) Read chapter 1    
3 September 4 (T) Ch2:  Data Pre-processing (Lecture Slides) Read chapter 2  
4 September 6 (R) Ch2:  Data Pre-processing (Lecture Slides)    
5 September 11 (T) Ch2:  Data Pre-processing (Lecture Slides) and Data Characteristics (Lecture Slides)

Quiz 1

HW1 (Due Sept 25th, 2007)

 

 
6 September 18 (T)  

No Class

Students are expected to experiment the following software

Both should be installed at the JCCL lab

   
7 September 25 (T) Chapter 3: Data Warehouse and OLAP Technology (Lecture Slides) HW2 (Due Oct. 2)

Exercise 3.4, 3.7

 

 
8 Oct. 2 (T) Chapter 5: Mining Frequent Patterns (Lecture Slides I) (Lecture Slides II) HW3 (Due Oct. 23)  
9 Oct. 9 (T) Chapter 6: Classification I (Lecture Slides)

Quiz 3

   
10 Oct 16 (T) Chapter 6: Classification II (Lecture Slides)    
11 Oct 23 (T) Chapter 6: Classification III (Lecture Slides) HW4  
12 Oct. 30 (T) Midterm Exam    
13 Nov. 6 (T) Chapter 7: Clustering I (Lecture Slides)    
14 Nov. 13 (T) Chapter 7: Clustering II (Lecture Slides)

 

Quiz 6

 
15 Nov. 20 (T) Chapter 7: Clustering II (Lecture Slides)    
16 Nov. 27 (T) Chapter 8.3: Mining Sequential Patterns and Chapter 9.1:  Graph Mining (Lecture Slides)    
17 Dec. 4  (T) Chapter 9.2; Chapter 10.4; and chapter 10.5: Text and Web Mining (Lecture Slides I) (Lecture Slides II)    
18        

 

Policies on Assignments and Exams

All project deliverables and assignments should be submitted before midnight on the due date. The only excuse for missing an exam is verifiable cases of illness and emergencies and religious holidays. Please check the dates for exams and inform me at the earliest of any conflict due to the above-mentioned reasons.

Code of Academic Integrity:

University Policies:

For academic misconduct, sexual harassment, religious holydays, and information on services for students with disabilities, see :
©2007 Tao Li. All rights reserved. last Updated: