CSCI 5622 Final Exam: (worth 20% of your final mark)

 

 

Due date: December 18, 2001!!!

 

 

Your goal is to build the best you can model using the training data contained in the following text file FinalTrainData.txt. This data file contains 501 examples, each having 200 input features and one classification output (which is either 0 or 1). The file is organized as follows:

 

 

Where  is feature  of training example , and  is the classification of training example . You are free to use any learning algorithm to generate a model based on this data (you are not limited to the algorithms we studied or those you implemented for homework assignments). Once you have built your best model, you will use it to predict the class outputs for the inputs contained in this data file FinalTestInputData.txt. This file contains 502 examples of input features with no classifications. Its format is as follows

 

 

You will use your best model to generate a set of predictions  where  and generate a file called TestOutput.txt that has the following format:

 

 

Note that I have the actual outputs  associated with the inputs in FinalTestInputData.txt (but you don’t!).

 

You will email me the following (by Dec 18 please!):

 

  1. A short write-up (not more than 5 pages) describing your final model and how you went about choosing it. This part is worth 15/20 (i.e. 15% of your final mark).
  2. Your classification prediction file TestOutput.txt. This part is worth 5/20 (i.e 5% of your final mark). I will use the following formula to calculate this part of your mark:

                                     

where the error rates are calculated as

                              

and

               

Therefore, if you produce a model that is better than mine, you can get better than 100% on the exam! But I’m not telling you what  is until everyone has handed in the final exam.

 

Best of luck!