skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · thesis defenses · 2002-2003 · 

Thesis Defense - Kurgan

ECOT 831

Meta Mining System for Supervised Learning
Lukasz A. Kurgan
Computer Science PhD Candidate

Supervised inductive machine learning is one of several powerful methodologies that can be used for performing a Data Mining task. Data Mining aims to find previously unknown, implicit patterns that exist in large data sets, but are hidden among large quantities of data. These patterns describe potentially valuable knowledge.

Data Mining techniques have been focused on finding knowledge, often expressed in terms of rules, directly from data. More recently, a new Data Mining concept, called Meta Mining, was introduced. It generates knowledge utilizing two-step procedure, where first meta-data is generated from the input data, and next the meta-data is used to generate meta-rules that constitute final data model.

In this dissertation we examine a new approach to generation of knowledge, using supervised inductive learning methodologies combined with Meta Mining. We propose a novel data mining system, called MetaSqueezer, for extraction of useful patterns that carry new information about input supervised data set. The major contribution of this thesis is design and development of the above system, supported by extensive benchmarking evaluation results. Two key advantages of the system are its scalability, which results from its linear complexity, and high compactness of user-friendly data models that it generates. These two features make it applicable for applications that use megabytes, or even gigabytes of data.

The usefulness of the system is evaluated theoretically and also empirically via thorough testing. The results show that the system generates very compact data models. They also confirm linear complexity of the system, which makes it highly applicable to real data. Results of application of the system to cystic fibrosis data are provided. This application generated very useful results, as evaluated by the domain experts.

Committee: Krzysztof Cios, University of Colorado at Denver (Chair)
Andrzej Ehrenfeucht, Professor
Clayton Lewis, Professor
Dennis Lezotte, CU School of Medicine
James Martin, Associate Professor

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:40)