Machine Learning (CSCI 4830)

 

Assignment 1: Due Wednesday October 9, 2002

 

 

The goal of this assignment is to become familiar with the weka data mining environment. You will use the following dataset:  data_Assignment1.txt. This dataset contains 4 attributes (a1, a2, a3 and a4) and 500 instances. The first three attributes are real valued and the last attribute is a class value (i.e. n or p).

 

Your task is to do the following:

 

  1. Convert the data file into an arff  (Attribute-Relation File Format) file.
  2. Find the relationships (associations) between all possible combinations of attributes. These relationships will be with respect to three different learning algorithms: weka.classifiers.j48.J48 (you will need to use weka.filters.DiscretizeFilter discretize the real attributes), weka.classifiers.LinearRegression, and a third method of your choice). Use 10 fold cross validation to evaluate the relationships.
  3. Summarize the relationships between all the attributes. Are all the variables predictable from the others? Are all the variables necessary in the predictions? Which of the three learning algorithms best predict these relationships?

 

I suggest you use the GUI Explorer to preprocess the data and look for relationships, and Experimenter to rate the three learning algorithms on the relationships.