Assignment
3
Neural Networks and Deep Learning
CSCI
7222
Spring 2015
Assigned
Jan 28
Due Feb 11, before class
Assignment submission
Submit
all assignments via CU's desire2learn
system.
Submission instructions are as follows:
- Go to Desire2learn web
site and enter identikey and
password, or access through mycuinfo -> Student -> Course
information -> Website
- Once on the desire2learn web
site, select our course, which is labeled "Tpcs Nonsymbolic Ai"
- Select the "assessments" tab
in the upper right corner of the page
- Select "dropbox" from the
drop down menu
- Select "assignment 3" and
then upload the file
Goal
The
goal of this assignment is to build a back propagation net which will
process the handprinted digits that we worked with for assignment 2.
You
should be able to re-use the code you wrote to read in the digits and
create a data structure
Data Set
The data set consists of handprinted digits,
originally provided by Yann Le Cun. Each digit is
described by a 14x14 pixel array. Each pixel has a grey level with
value ranging from 0 to 1. The data is split between two
files, a
training
set that
contains the examples used for training your neural network, and a
test
set that contains examples
you'll use to evaluate the trained network. Both training and
test sets are organized the same way. Each file begins with 250
examples of the digit "0", followed by 250 examples of the digit "1",
and so forth up to the digit "9". There are thus 2500 examples in the
training set and another 2500 examples in the test set.
Each digit begins with a label on a line by itself, e.g., "train4-17",
where the "4" indicates the target digit, and the "17" indicates the
example number. The next 14 lines contain 14 real values specifying
pixel intensities for pixels in the corresponding row. Finally, there
is a line with 10 integer values indicating the target. The
vector "1 0 0 0 0 0 0 0 0 0" indicates the target 0; the vector "0 0 0
0 0 0 0 0 0 1" indicates the target 9.
Part 1 (suggestion: do this in week 1)
Implement a network with 196 inputs (for the 14x14
digit pattern) and 10 logistic output neurons (for the digit classes
0-9 and direct connections from the inputs to the outputs. Train with a
squared error cost function. This is equivalent to performing 10
logistic regressions in parallel (using a squared error cost function
for each).
(a) Report how you set learning rates and decided when to stop training
the network.
(b) Make a plot of error as a function of epoch during training.
(c) Compute the accuracy on the test set. To determine if a
response is correct on the test set, see if that neuron is the most
active.
(d) [OPTIONAL] In addition to plotting the training error as
training proceeds, superimpose a plot of error on the test set.
(I.e., run the test set through the network with weights frozen
at the end of each epoch of training). How does training error
compare to test error? If the training and test sets have similar
statistics and the network isn't overfitting the training set, then the
errors should match pretty well.
Part 2
Implement a network with 196 inputs and 10
normalized exponential output neurons. Train this with a cross
entropy error measure. Both the normalized exponential output and
the cross entropy error are explained in
this
Hinton video.
(a) Report how you set learning rates and decided when to stop training
the network.
(b) Make a plot of error as a function of epoch during training.
(c) Compute the accuracy on the test set. To determine if a
response
is correct on the test set, see if that neuron is the most active.
(d) Do you see much difference between the networks in parts 1 and 2?
Part 3
Implement three layered back propagation on the
digits data set. Make the architecture strictly layered:
input-to-hidden, hidden-to-output. You can choose an
activation function for the hidden neurons (logistic, or symmetric
sigmoid) and for the output neurons (logistic, symmetric sigmoid, or
normalized exponential) and choose an error function (squared error or
cross entropy). Note that these alternatives can be mixed and
matched as you will, leading to 12 different possibilities. If
you want a default suggestion, I'd say: use symmetric sigmoid for the
hidden, normalized exponential for the output, and cross entropy for
the error function.
(a) Train the net with 2, 5, 10, or 15 hidden units.
(b) Plot training error as a function of epoch.
(c) After training is complete, plot classification accuracy on the
training set (i.e., number correctly classified, not training error)
for the 4 networks
(d) After training is complete, plot classification accuracy on the
testing set
(a) Choan output neuroTrain a perceptron to discriminate 8 from 0. You
will have 500 training examples and 500 test examples.
Part 4 (Optional)
Train a perceptron with 10 outputs to classify the
digits 0-9 into distinct classes. Using the test set, construct a
confusion matrix. The 10x10 confusion matrix will specify the
frequency by which each input digit is assigned to each output class.
The diagonal of this matrix will indicate correct classifications.