Assignment 1
Neural Networks and Deep Learning

CSCI 5922
Fall 2017

Assigned Aug 31
Due Sep 12

Assignment submission

Please read instructions here.

Goal

The goal of this assignment is to introduce neural networks in terms of ideas you are already familiar with:  linear regression and linear-threshold classification.

Part 1

Consider the following table that describes a relationship between two input variables (x1,x2) and an output variable (y).

x1
x2
y
.1227
.2990
+0.1825
.3914
.6392
+0.8882
.7725
.0826
-1.9521
.8342
.0823
-1.9328
.5084
.8025
+1.2246
.9983
.7404
-0.0631

This is part of a larger data set that I created which you can download either in matlab or text format. Using your favorite language, find the least squares solution to y = w1 * x1 + w2 * x2 + b.

(1a) Report the values of w1, w2, and b.
(1b) What function or method did you use to find the least-squares solution?

Part 2

Using the LMS algorithm, write a program that determines the coefficients {w1,w2,b} via incremental updating, steepest descent, and multiple passes through the training data. You will need to experiment with updating rules (online, batch, minibatch), step sizes (i.e., learning rates), stopping criteria, etc. Experiment to find settings that lead to solutions with the fewest number of sweeps through the data.

(2a) Report the values of w1, w2, and b.
(2b) What settings worked well for you:  online vs. batch vs. minibatch? what step size? how did you decide to terminate?
(2c) Make a graph of error on the entire data set as a function of epoch. An epoch is a complete sweep through all the data.

Part 3

Turn this data set from a regression problem into a classification problem simply by using the sign of y (+ or -) as representing one of two classes. In the data set you download, you'll see a variable z that represents this binary (0 or 1) class.  Use the perceptron learning rule to solve for the coefficients {w1, w2, b} of this classification problem.

Two warnings: First, your solution to Part 3 should require only a few lines of code added to the code you wrote for Part 2. Second, the Perceptron algorithm will not converge if there is no exact solution to the training data. It will jitter among coefficients that all yield roughly equally good solutions.

(3a) Report the values of coefficients w1, w2, and b.
(3b) Make a graph of the accuracy (% correct classification) on the training set as a function of epoch.

Part 4

In machine learning, we really want to train a model based on some data and then expect the model to do well on "out of sample" data. Try this with the code you wrote for Part 3:  Train the model on the first {5, 10, 25, 50, 75} examples in the data set and test the model on the final 25 examples.

(4a) How does performance on the test set vary with the amount of training data? Make a bar graph showing performance for each of the different training set sizes.