Machine Learning (CSCI 4830)
Assignment 1: Due Wednesday
October 9, 2002
The goal of this assignment is to become familiar with the weka data mining environment. You will use the following
dataset: data_Assignment1.txt.
This dataset contains 4 attributes (a1, a2, a3 and a4) and 500 instances. The
first three attributes are real valued and the last attribute is a class value (i.e.
n or p).
Your task is to do the following:
- Convert
the data file into an arff (Attribute-Relation
File Format) file.
- Find the
relationships (associations) between all possible combinations of
attributes. These relationships will be with respect to three different
learning algorithms: weka.classifiers.j48.J48 (you will need to use weka.filters.DiscretizeFilter discretize
the real attributes), weka.classifiers.LinearRegression,
and a third method of your choice). Use 10 fold cross validation to
evaluate the relationships.
- Summarize
the relationships between all the attributes. Are all the variables
predictable from the others? Are all the variables necessary in the
predictions? Which of the three learning algorithms best predict these
relationships?
I suggest you use the GUI Explorer to preprocess the data and look for relationships, and Experimenter to rate the three learning
algorithms on the relationships.