
NSF/IIS: CAREER: Mining Log
Data for Computing System Management
National Science Foundation
Award Number: NSF
IIS-0546280 (January
1, 2006 to December 31, 2010)
Contact Information
Tao Li,
PI
School of Computer Science
Florida International University
11200 SW 8th Street,
Miami,
FL 33199 U.S.A.
Office: (305) 348-6036, Fax: (305) 348-3549
E-mail: taoli at cs.fiu.edu, URL:
http://www.cs.fiu.edu/~taoli
List of Supported Students and Staff
-
Dingding Wang:
Ph.D. student,
School of Computer
Science, Florida International University
- Summer Intern at NEC Research, 2007
-
Didier Garcia:
Undergraduate student,
School
of Computer Science, Florida International University
- Summer Intern at IBM Almaden Research
Center, 2006
- Summer Intern at IBM Extreme Blue, 2007
- Ruskin Miller:
Undergraduate student,
School
of Computer Science, Florida International University
-
Wei
Peng:
Ph.D. student,
School of Computer Science,
Florida International University
- The Best Graduate Student Research Award, School of Computer Science,
Florida International University (2006)
- Excellence Award, School of Computer
Science, 2006-07.
- Summer Intern at Xerox Research, 2006
- Summer Intern at Xerox Research, 2007
Xin Wang:
Ph.D. student,
School of Computer Science,
Florida International University
Project Award Information
-
Award Number: NSF IIS 0546280
-
Duration: January,
2006 to December, 2010
-
Title: CAREER: Mining Log
Data for Computing System Management
-
Keywords: Data
Mining, Log Data, System Log, Computing System Management, Autonomic Computing
-
Click here to see the Award information on NSF website.
Project Summary
Advancements in science and
technology have led to increased complexity in computing systems. With a growing
number of heterogeneous software and hardware components, computing systems are
becoming more and more difficult to monitor, manage and maintain. The goal of
this project is to develop an integrated framework on mining log data for
automatic system management. The project is conducting research on: (1)
developing new methods and tools for log data organization which create
consistency and improve the ability to correlate across multiple log files; (2)
developing new methods and tools for data-driven pattern discovery and problem
determination; and (3) developing new methods and tools to bridge the gap
between the system management applications and the intelligent techniques. The
integrated framework resulting from this research will provide much better
techniques for monitoring, analyzing, and adapting complex computing systems.
Results from this research will be disseminated through publications and the
software tools being made publicly available through a web portal. The
educational component of this project includes developing a new curriculum that
incorporates research into the classroom and provides women, minorities, and
undergraduate students with opportunities to participate research. Florida
International University (FIU) is among the top awarders of computer science
degrees to Hispanic students in the USA and its history of involving the
participation of underrepresented groups in the research efforts will be
leveraged during the course of this project. This project will also strengthen
industry research collaborations and have immediate applications to fields other
than system management.
Publications and Products
Journal articles
(including accepted)
- Tao Li,
Shenghuo Zhu, and
Mitsunori Ogihara. Text
Categorization via Generalized Discriminant Analysis.
Information Processing and Management, to appear as a regular paper, 2008.
- Chris
Ding, Tao Li and
Wei Peng. On the Equivalence
Between Nonnegative Matrix Factorization and Probabilistic Latent Semantic
Indexing.
Computational Statistics and Data Analysis, to appear as a regular paper,
2008.
-
Tao Li,
Chang-Shing
Perng and
Sheng Ma. Guest Editorial: Special Issue on Temporal Data Mining: Theory,
Algorithms and Applications.
Data Mining and Knowledge Discovery, 16(1): 1-3, 2008.
- Yanfang Ye,
Dingding Wang,
Tao Li and Dongyi Ye. An Intelligent
PE-Malware Detection System Based on Association Mining, to appear in
Journal in Computer
Virology, 2008.
- Tao Li.
Clustering Based on Matrix Approximation.
Knowledge and Information Systems Journal, to appear as a regular paper, 2008.
- Li Zhang, Tao Li, Shi Xia Liu,
and Yue Pan; An Integrated System for Building
Enterprise Taxonomies.
Information
Retrieval, To appear as a regular paper, 2008.
- Tao Li,
Shenghuo Zhu, and
Mitsunori Ogihara.
Hierarchical Document Classification Using
Automatically Generated Hierarchy.
Journal of
Intelligent Information Systems. To appear as a regular paper, 2007.
- Tao Li.
A Unified View On Clustering Binary Data.
Machine Learning,62(3):
199-215 (2006)
- Tao Li,
Shenghuo Zhu, and
Mitsunori Ogihara.
Using Discriminant Anlaysis for Multi-class
Classification: An Experiment Investigation.
Knowledge and Information Systems Journal,
10(4): 453-472 (2006).
- Wei Peng,
Tao Li, and Sheng Ma.
Mining Log Files for Data-Driven System
Management. SIGKDD
Explorations, Volume 7, issue 1, pages 44-51, June 2005.
Refereed
Conference Publications
- Shenghuo Zhu,
Tao Li,
Zhiyuan Chen,
Dingding Wang, and Yihong Gong.
Dynamic Active Probing of Helpdesk Databases. In Proceedings of the
34th International Conference on Very Large Data Bases (VLDB
2008).
- Heng Huang,
Chris Ding, Dijun Luo,
Tao Li. Simultaneous Tensor
Subspace Selection and Clustering: The Equivalence of High Order SVD and K-Means
Clustering. In Proceedings of the The 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (SIGKDD
2008).
- Tao Li,
Chris Ding,
Yi Zhang, and
Bo Shao. Knowledge
Transformation from Word Space to Document Space. In Proceedings of The 31st
Annual International ACM SIGIR Conference (SIGIR
2008).
- Dingding Wang,
Tao Li,
Shenghuo Zhu,and
Chris Ding. Multi-Document
Summarization via Sentence-Level Semantic Analysis and Symmetric Matrix
Factorization. In Proceedings of The 31st Annual International
ACM SIGIR Conference (SIGIR 2008).
-
Wei Peng
and Tao Li. Author-Topic Evolution
Analysis using Three-way Non-negative Paratucker. In Proceedings of The 31st
Annual International ACM SIGIR Conference (SIGIR
2008). (Poster)
- Fei Wang,
Tao Li and Changshui
Zhang. Semi-Supervised Clustering via Matrix Factorization. In
Proceedings of 2008 SIAM International Conference on Data Mining (SDM
2008).
-
Tao Li and
Chris Ding. Weighted Consensus
Clustering. In Proceedings of 2008 SIAM International Conference on Data
Mining (SDM 2008).
- Tao Li,
Chris Ding and
Michael I. Jordan.
Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative
Matrix Factorization. In Proceedings of 2007 IEEE International
Conference on Data Mining (ICDM
2007).
- Zhongyuan Zhang,
Tao Li,
Chris Ding and Xiang-Sun Zhang.
Binary Matrix Factorization with Applications. In Proceedings of 2007
IEEE International Conference on Data Mining (ICDM
2007).
- Wei
Peng, Chris Ding,
Tao Li and Tong Sun. Finding
Hotspots in Document Collection. In Proceedings of the 19th IEEE
International Conference on Tools with Artificial Intelligence (ICTAI
2007).
-
Yanfang Ye, DingDing Wang, Tao Li,
and Dongyi Ye.
IMDS: Intelligent Malware Detection System. In Proceedings of ACM Int'l
Conf. on Knowledge Discovery and Data Mining (SIGKDD
2007), to appear.
-
Chris Ding, Rong Jin,
Tao Li, and Horst D. Simon.
A
Learning Framework using Green's Function and Kernel Regularization with
application for recommender system.
In Proceedings of ACM Int'l Conf. on Knowledge Discovery and Data Mining
(SIGKDD 2007).
-
Wei Peng, Charles Perng,
Tao Li, and Haixun Wang.
Event Summarization
for System Management. In Proceedings of ACM Int'l Conf. on Knowledge
Discovery and Data Mining (SIGKDD 2007).
-
Fei Wang, Changshui
Zhang, and
Tao Li.
Regularized
Clustering for Documents.
In Proceedings of the 30th Annual International ACM SIGIR Conference on Research
& Development in Information Retrieval (SIGIR
2007)
-
Chris Ding and
Tao Li.
Adaptive
Dimension Reduction Using Discriminant Analysis and K-means Clustering.
In Proceedings of International Conference on Machine Learning (ICML
2007).
-
Fei Wang, Changshui
Zhang, and Tao Li..
Clustering with Local and Global Regularization.
In Proceedings of the 22nd National Conference on Artificial Intelligence
(AAAI 2007).
-
Zhiyuan Chen and
Tao Li.
Addressing
Diverse User Preferences in SQL-Query-Result Navigation. In Proceedings
of the 2007 ACM SIGMOD Conference (SIGMOD
2007), Pages 641-652.
-
Tao Li and
Chris Ding. The Relationships among
Various Nonnegative Matrix Factorization Methods for Clustering. In
Proceedings of the 2006 IEEE International Conference on Data Mining (ICDM
2006), Pages 362-371.
- Alex F. Wang, Sheng Ma,
Liuzhong Yang, and Tao Li.
Recommendation on Item Graphs. In Proceedings of the 2006 IEEE International
Conference on Data Mining (ICDM
2006), Pages 1119-1123.
- Wei
Peng and Tao Li. Interval
Data Clustering with Applications. In Proceedings of the 18th IEEE
International Conference on Tools with Artificial Intelligence (ICTAI
2006), Pages 355-362.
-
Tao Li, Chengliang Zhang, and
Shenghuo Zhu. Empirical
Studies on Multilabel Classification. In Proceedings of the 18th IEEE
International Conference on Tools with Artificial Intelligence (ICTAI
2006), Pages 86-92.
-
Chris Ding,
Tao Li,
Wei Peng, and
Haesun Park.
Orthogonal Nonnegative Matrix Tri-factorizations for
Clustering. In
Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (SIGKDD 2006),
Pages 126-135.
-
Chris Ding,
Tao Li and
Wei Peng.
Nonnegative Matrix Factorization and
Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a
Hybrid Method.
In Proceeding of National Conference on Artificial Intelligence
(AAAI-06).
- Weixiang Sun,
Tao Li, Wei Peng, and Tong
Sun,
Incremental Workflow Mining with Optional Patterns,
In proceedings of 2006 IEEE International Conference on Systems, Man, and
Cybernetics
(SMC 2006).
- Tao Li and
Wei Peng.
A Clustering Model
Based on Matrix Approximation with Applications to Cluster System Log Files.
In Proceedings of the 16th European Conference on Machine Learning (ECML
2005), Pages 625-632.
- Tao Li.
A General Model for
Clustering Binary Data. In Proceedings of the Eleventh ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (SIGKDD
2005), Pages 188-197.
- Tao Li,
Feng Liang, Sheng Ma, and
Wei Peng.
An Integrated Framework
on Mining Logs Files for Computing System Management. In Proceedings of
the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (SIGKDD 2005), Pages
776-781.
Project Impact
-
Education:
Parts of the new research results are used in Data Mining courses
(CAP4770, COP5577, CAP6778) for both undergraduate and graduate students being
taught in the School of Computer Science at Florida International University.
Moreover, the research results have been and will continuously be published
timely in international conferences and journals and be distributed world-wide
for education and research.
-
Course Development: There are three courses were developed and
offered in the School of Computer Science at Florida International University
-
Collaborations:
For this project we have established collaborations with Lawrence
Berkeley National Lab, IBM
T.J.
Watson
Research
Center, IBM Autonomic Computing,
Xerox Research, NEC Research. Through such collaborations we expect to have
access to real datasets and applications and produce more research results.
List of any software being distributed by
project
We are currently developing the following prototype software
toolkits:
List of any datasets being distributed from
project
N/A
Other Related Links
-
KDD 2008
Workshop on Data Mining Using Matrices and Tensors(DMMT08), Las Vegas,
USA, August, 2008
-
KDD 2006 workshop on
theory and practice of temporal Data Mining, held in conjunction
with The Twelfth Annual SIGKDD
International Conference on Knowledge Discovery and Data Mining
(KDD 2006)
-
ICDM
2005 workshop on Temporal Data Mining: Algorithms, Theory and Applications,
held in conjunction with
The Fifth
IEEE International Conference on Data Mining (ICDM'05)
-
ICDM
2004 workshop on Temporal Data Mining: Algorithms, Theory and Applications,
held in conjunction with
The Fourth IEEE
International Conference on Data Mining (ICDM'04)
- A Special Issue on Temporal Data Mining with
DMKD (Data Mining and
Knowledge Discovery) Journal, (Call
for Papers)