Assignment 5
Neural Networks and Deep Learning

CSCI 7222
Spring 2015

Assigned Mar 4
Due mid April

Assignment submission

Submit all assignments via CU's desire2learn system.  Submission instructions are as follows:
  1. Go to Desire2learn web site and enter identikey and password, or access through mycuinfo -> Student -> Course information -> Website
  2. Once on the desire2learn web site, select our course, which is labeled "Tpcs Nonsymbolic Ai"
  3. Select the "assessments" tab in the upper right corner of the page
  4. Select "dropbox" from the drop down menu
  5. Select "assignment 5" and then upload the file

Goals

The goal of this assignment is to hit the big time: to explore a state-of-the-art data set using a state-of-the-art neural net model. For this assignment, we will switch over to using a simulator package of your choice.

Data Set

The data set we'll use is the ongoing Data Science Bowl set available on Kaggle. This data set is a collection of labeled images of plankton and sea creatures. There are about 30k examples in the training set and 121 classes.  The classes include various sorts of "unknown" objects. Some of the classes have relatively little data, and you may decide to simply ignore these classes.plankton image

Simulators

Although you're welcome to use your own code for these images, I suspect you won't do very well unless you have a deep convolutional net. I recommend switching to a neural net simulator. I list a variety of simulators on the course page, but I don't have any experience with these simulators. The best bet to me looks like torch7. Hopefully as you gain experience, you'll share it with your colleagues and help them to avoid choosing the wrong simulator.

Most of the simulators let you do GPU computing. The CSEL has a machine with a high powered GPU, and all of the machines in CSEL appear to have reasonable GPUs.

The Assignment

Build the best classifier you can. You can use the Kaggle competition submission mechanism to estimate your test set performance, and once the Kaggle competition is over, we can use the remainder of the test set to rank submissions within the class (and to compare to the competition entrants). 

Insights

I suspect that it may serve you well to do some image preprocessing to put the images in a more canonical form (e.g., aligned with a central axis). Just my guess. You may also think about nontraditional and neurobiologically motivated approaches based on multiple fixations or analysis of parts  (see Kanan 2013).