Spring
2008.
|
Location:
|
Wednesdays 3:00pm-5:30pm ECCR 131 |
|
Instructor: |
Professor Greg Grudic |
|
Office: |
ECOT 525 |
|
Office Hours: |
Tuesday and Thursday 2:00 to 3:00 And By Appointment |
|
Phone: |
303-492-4419 |
|
Email: |
grudic@cs.colorado.edu |
Term
Project: (Due May 7, 11:55PM – Please email the project directly to me.) The project write-up should include concise
descriptions of 1) what you did, 2) why you did it, 3) your experimental or
theoretical results, 4) a conclusion, and 5) future work (if applicable). All
software written for the project should be submitted, along with all detailed
experimental results if appropriate. The default project should either use
either WEKA (http://www.cs.waikato.ac.nz/ml/weka/
) or other publically available software to analyze the following datasets: Data_Default_Project.zip – there are five
data sets, four classifications and one regression. You should use at least 3
different algorithms to estimates future error rates on each data set, for each
algorithm type.
Quizzes:
1.
Quiz 1. CSCI5622_quiz_1.pdf
2.
Quiz 2. CSCI5622_quiz_2.pdf
3.
Quiz 3: CSCI5622_quiz_3.pdf
4.
Quiz 4: CSCI5622_quiz_4.pdf
Homework:
1.
Homework 1:
See http://www.colorado.edu/physics/pion/csci5622-spring08
2.
Homework 2: See http://www.colorado.edu/physics/pion/csci5622-spring08
3.
Homework 3:
Map the auto data in Homework 1 into Gaussian Kernel Space. Use the algorithms
developed in Problem 4 of HW2 (set the number of cross validation folds to 5),
to chose the optimal kernel parameter (sigma) and Nearest Neighbor parameter k
in kernel space. Compare this to the k you get in the original input space of
this dataset. Report the cross validation error rate you get by doing k Nearest
Neighbor in input space, as well as in Kernel space. Which space do you think
is better for this problem domain (assuming the k Nearest Neighbor algorithm is
used)? Email (to the marker Avleen.Bijral@colorado.edu) all matlab
code used (zipped) and answers to the above question by 11:55PM on Wednesday,
March 6. Make sure to normalize your data!
1.
Homework 4:
Modify the Perceptron cost/loss function to mimic the
Support Vector Machine Classification cost/loss function. You are free to use
any code that I have posted on the web. You are also free to use numerical
differentiation or analytic differentiation to implement your modified version
of the Perceptron gradient descent algorithm. Test
your algorithm on synthetic data that you design and generate to verify the
algorithm. Verify the algorithm first in linear space for linearly separable
data and non-linearly separable data (where one point is not linearly
separable). Then verify the algorithm on nonlinear data (that you design for
the test) using the Gaussian Kernel. The assignment is due by 11:55PM on Wednesday, April 16
(the assignment includes a description and justification – i.e. why is it SVM
like - of the cost function, description of your gradient descent algorithm, matlab code used to generate verification data, and matlab code used to implement the algorithm and the model).
I will post data that you will apply to your algorithm on by April 2. You are
free to work in groups to try to understand this assignment, but the work you hand in must be your own!
You are also free to look on the web for descriptions of such algorithm (but
this is not needed). If you do this, you must let me and the marker know which
paper you are using and why. You must also implement all algorithms in the
paper on your own.
2.
Homework 5:
Homework due April 30, 2008, 1155PM. Estimate the future error of a model constructed on
the data HW5_Data.zip using a support vector machine
with a radial basis function kernel. Use the LIBSVM package (http://www.csie.ntu.edu.tw/~cjlin/libsvm/
) with the Matlab Interface (http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/matlab.cgi?+http://www.csie.ntu.edu.tw/~cjlin/libsvm/matlab+zip
). To estimate your error, do 50 random experiments, each time splitting the
data randomly into 10% for testing and 90% for training. The training data
should be used in a 5 fold cross validation experiment to pick the “best”
radial basis function kernel parameter gamma,
and the “best” value for C (use the
C-SVM implementation; i.e. set “–s 0”). Make sure to normalize your data and to
find an appropriate search range for C
and gamma. Once you find your
optimal C and gamma, use the entire 90% of the data to build a single SVM model
with these parameters, and use it to obtain an error rate on the 10% test data.
Report your final estimated error on future data by averaging the error rates
over the 50 experiments. Repeat this experimental scenario by mapping the data into radial basis
function kernel space (i.e. the same kernel as for the SVM experiments), and
using the SPARSE FISHER LDA code presented in class to construct a model in
this space. Once more use the 90% training set to perform 5 fold cross
validation to pick the “best” radial basis function kernel parameter gamma, and the number of basis functions (terms) in the SPARSE FISHER LDA model
given this gamma (to get the
estimate of the number of terms in the model, average the number of terms used in
each 5 fold runs for a specific gamma,
and round to the nearest integer). Once these learning parameters are known,
use the entire 90% training set to build a single model for testing on the 10%
test set. Report the average error on future data of this SPARSE FISHER LDA
algorithm by averaging all your errors over the 50 random experiments. In a
single zip file, email the TA (and me) in 1) all code (which must be in matlab) used to run these experiments, 2) instructions on
how to run the code, 3) the final error rates on for the SVM algorithm and the
SPARSE FISHER LDA.
Weekly Class Schedule:
1.
January 16, 2008: Introduction. K Nearest Neighbor Algorithm.
2.
January 23, 2008: Cross Validation, Model Selection, and Accuracy Estimation.
Please read and be ready to discuss accEst.pdf. (Guest
Lecturer: Sam Reid). See http://www.colorado.edu/physics/pion/csci5622-spring08
3.
January 30, 2008: Cross Validation, Model Selection, and Accuracy Estimation.
Please read and be ready to discuss accEst.pdf. (Guest
Lecturer: Sam Reid). See http://www.colorado.edu/physics/pion/csci5622-spring08
4.
February
6, 2008: Reading material for next week: Please read and be prepared
to discuss these two documents Classification_1.pdf
and Regression_1.pdf.
5.
February
13, 2008: Introduction to regression and classification. Mercer_Kernels.pdf
6.
February 20, 2008: Intro to
classification Classification_1.pdf. Perceptron Algorithm (Perceptron.pdf).
7.
February 27, 2008: Perceptron Algorithm (Perceptron.pdf) (Perceptron_Demo.zip).
Kernel Demo (Demo_Kernels.zip). Regression Demo
(Reg_Demos.zip). Fast Gaussian Kernel Calculations
(Fast_Gaussian_Kernels.zip). Linear
Discriminant Analysis (LDA), Fisher's Linear Discriminant, Quadratic
Discriminant Analysis (QDA).
8.
March 5, 2008: Support Vector
Classification (SMV_classification.pdf).
9.
March 12, 2008: Support Vector Regression (SMV_regression.pdf). Decision Trees (Trees.pdf).
10.
March 19, 2008: Linear
Discriminant Analysis (LDA), Fisher's Linear Discriminant, Quadratic
Discriminant Analysis (QDA), Mixture of Gaussians (LDA_QDA_FISHER.pdf).
Neural Networks (NeuralNetwork.pdf).
11.
April 2, 2008: Sparse Linear Systems. (Notes). (Code).
12.
April 9, 2008: Reinforcement Learning.
13.
April 16, 2008: Neural
Networks (NeuralNetwork.pdf). Ensemble Learning
(Approaches_To_Supervised_Learning.pdf).
Predicting Model Error (Model_Selection_1.pdf).
14.
April 23, 2008: Ensemble Learning (Approaches_To_Supervised_Learning.pdf). Bayesian Learning. K-Means Clustering (Bayesian_1.pdf).
15.
April 30, 2008: Dimensionality Reduction. Semi-Supervised Learning. Spectral
Clustering.
If you qualify for
accommodations because of a disability, please submit to me a letter from Disability
Services in a timely manner so that your needs may be addressed. Disability Services determines accommodations
based on documented disabilities.
Contact: 303-492-8671, Willard 322, and http://www.Colorado.EDU/disabilityservices.
Campus policy regarding
religious observances requires that faculty make every effort to reasonably and
fairly deal with all students who, because of religious obligations, have
conflicts with scheduled exams, assignments or required attendance. See full
details at http://www.colorado.edu/policies/fac_relig.html.
Students and faculty each
have responsibility for maintaining an appropriate learning environment. Those
who fail to adhere to such behavioral standards may be subject to discipline.
Professional courtesy and sensitivity are especially important with respect to
individuals and topics dealing with differences of race, culture, religion, politics,
sexual orientation, gender, gender variance, and nationalities. Class rosters are provided to the instructor
with the student's legal name. I will gladly honor your request to address you
by an alternate name or gender pronoun. Please advise me of this preference
early in the semester so that I may make appropriate changes to my
records. See polices at
http://www.colorado.edu/policies/classbehavior.html and at http://www.colorado.edu/studentaffairs/judicialaffairs/code.html#student_code
The University of Colorado at
Boulder policy on Discrimination and Harassment, the University of Colorado
policy on Sexual Harassment and the University of Colorado policy on Amorous
Relationships apply to all students, staff and faculty. Any student, staff or faculty member who
believes s/he has been the subject of discrimination or harassment based upon
race, color, national origin, sex, age, disability, religion, sexual
orientation, or veteran status should contact the Office of Discrimination and
Harassment (ODH) at 303-492-2127 or the Office of Judicial Affairs at
303-492-5550. Information about the ODH,
the above referenced policies and the campus resources available to assist
individuals regarding discrimination or harassment can be obtained at http://www.colorado.edu/odh
All students of the
University of Colorado at Boulder are responsible for knowing and adhering to the
academic integrity policy of this institution. Violations of this policy may
include: cheating, plagiarism, aid of academic dishonesty, fabrication, lying,
bribery, and threatening behavior. All
incidents of academic misconduct shall be reported to the Honor Code Council
(honor@colorado.edu; 303-725-2273). Students who are found to be in violation
of the academic integrity policy will be subject to both academic sanctions
from the faculty member and non-academic sanctions (including but not limited
to university probation, suspension, or expulsion). Other information on the
Honor Code can be found at http://www.colorado.edu/policies/honor.html and at http://www.colorado.edu/academics/honorcode/