1 CSE 517A: Machine Learning

CSE 517A: Machine Learning

Spring 2007

[What] [Who] [Where] [When]
[Prerequisites] [Textbook] [Lectures] [Homework] [Projects]
[Grading] [Collaboration]

What's New?

Extra credit homework available
The extra credit homework is available now. You don't have to do this homework, but if you want some extra points, you can give it a try.
Project due date
Projects will be due at 23:59:59 on Friday, May 4th. Submissions should be in the handin bin by that time. If you submit your project later than this, we cannot guarantee that it will be graded by the time we need to turn in your grades.
The class data set is available
Get it here in MS Excel format.
Project information and homework 4
The promised information about projects is now available here. Homework 4 is available here.
Homework 3 is finally ready
Get it here. Sorry for the delay. Make sure you make a start before Spring Break. Naive Bayes on the beach is not likely to be fun...


What

This course is a broad introduction to the field of Machine Learning. We will cover a number of classic and current machine learning algorithms, and show how they can be applied to a variety of real world problems.

From the course catalog:

Formerly CS 527A. The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. Recently, many successful machine learning applications have been developed, ranging from data-mining programs that learn to detect fraudulent credit card transactions, to information-filtering systems that learn users' reading preferences, to autonomous vehicles that learn to drive. There have also been important advances in the theory and algorithms that form the foundation of this field. This course will provide a broad introduction to the field of machine learning.


Who

Instructor: Bill Smart, wds@cse.wustl.edu
Office Hours: Lopata 516, by appointment
Teaching Assistants:
Stu Glaser, stuglaser@gmail.com        Hours: Sunday, 14:30-16:30, TA Grader Lounge
Yunpeng Xu, xu_yunpeng@hotmail.com        Hours: Monday & Friday 15:00-16:00, TA Grader Lounge


Where and When

Lectures will be held in Whitaker 218, on Tuesdays and Thursdays from 11:30am-1pm.


Prerequisites

There are no formal prerequisites for the class. However, we will be using some mathematics and statistics. To get the most out of the class, you should have some background in the following:
  1. The ability to write software in some reasonable language. You will be implementing some of the algorithms we talk about in class. I don't care which language you use, but you should be comfortable in at least one language.
  2. Some linear algebra. You should at least know what matrices and vectors are, and be able to perform operations on them (multiplication, inversion, etc.).
  3. Some calculus. We're going to be using derivatives, partial derivatives, and (some) integration. You should be able to take the derivatives and integrals of simple functions, and understand what these quantities actually represent.
  4. Statistics and Probability: You should know what a random variable is, and what it means to take a sample from a random variable. You should also be familiar with means and variances, the common probability distributions (such as uniform and normal), and have at least heard of hypothesis testing.


Textbook

There is one required textbook for this course: "Introduction to Machine Learning", Ethem Alpaydin, MIT Press, 2004. We will also be handing out supplementary material during the semester. You'll want to have access to a copy of the book, since I'll be referring to it extensively, and will probably be assigning homework questions from it.

Another good book that covers a subset of the material in the class is "Machine Learning", Tom Mitchell, McGraw-Hill, 1997. This is a slightly older book, and doesn't cover some of the newer material in Alpaydin. However, it's still a good reference, especially if you can pick it up second-hand.


Lectures

[January] [February] [March] [April] [May]
Meeting Date Topics Reading Homework Assigned Links
1 January 16 Admin
What it machine learning?
Chapter 1
2 January 18 Instance-Based Learning
   k-Nearest Neighbour
Chapter 8
3 January 23 Instance-Based Learning
   Locally Weighted Averaging, Locally Weighted Regression
Locally Weighted Learning
4 January 25 Decision Trees
   Basic concepts, Entropy
Chapter 9
5 January 30 Evaluating and Comparing Machine Learning Algorithms Homework 1
6 February 1 Evaluating and Comparing Machine Learning Algorithms (Hopefully) clearer notes on performance evaluation.
7 February 6 Decision Trees
   Information gain, ID3, Bias
8 February 8 Decision Trees
   Overfitting and pruning, Continuous attributes, Gain ratio splits
9 February 13 Bayesian Learning    Basic concepts and Bayes Rule Homework 2
10 February 15 Bayesian Learning    ML and MAP hypotheses
11 February 20 Bayesian Learning    The k-means algorithm
12 February 22 Bayesian Learning    The EM algorithm Recent EM results!
13 February 27 Artificial Neural Networks    Basic concepts and neuroscience inspiration
   Perceptrons and threshold units
14 March 1 Artificial Neural Networks    Linear and non-linear units
   Gradient descent learning
Homework 3
15 March 6 Artificial Neural Networks    Multi-layer networks
16 March 8 Artificial Neural Networks    Other (non-feedforward) networks
March 13 No class: Spring Break
March 15 No class: Spring Break
17 March 20 Reinforcement Learning
18 March 22 Reinforcement Learning
19 March 27 Reinforcement Learning
20 March 29 Reinforcement Learning Homework 4
21 April 3 Genetic Algorithms
22 April 5 Genetic Algorithms
23 April 10 Dimensionality Reduction
24 April 12 Dimensionality Reduction
25 April 17 TBD
26 April 19 TBD
27 April 24 TDB
28 April 26 TDB
29 May 1 TDB


Homework

Number      Topics      Assigned      Due
Homework 1 Instance-based algorithms January 30 February 12, 23:59:59
Homework 2 Decision trees February 13 February 27, 19:15
Homework 3 Bayesian learning March 4 March 26, 16:59:59
Homework 4 Artificial neural networks and reinforcement learning March 29 April 16, 17:00
Homework 5 TBD TBD TBD


Projects

You will be required to complete a small project, worth 30% of the class grade. This project should apply machine learning to a particular problem, or perform an evaluation of a machine learning algorithm in various settings. In either case, you will be required to generate a final report that details your project, presents your results, and provides a discussion of them. There are two options for the project. You can either
propose your own project to us, or select one of the predefined projects we've come up with.


Grading Policy

There will be a number of homework assignments, worth a total of 70% of class grade. There will also be a final project, worth 30% of the total grade. Each of the homework assignments will have extra credit questions available. Grades for the class will be assigned as follows:

Score      Grade
85+ A
75+ B
65+ C
55+ D
0+ F

The late policy for the class is 10% per day late, up to a maximum of three days. If you're more than three days late on an assignment, you get zero points for that assignment. If you have some valid reason for needing more time on an assignment, then you should contact me at least two days before the deadline to request an extension. Last-minute requests will only be met in exceptional circumstances.


Collaboration Policy

Everything that you turn in for this class, and your answers on all of the quizzes and exams must be your own work, unless we explicitly tell you otherwise. If you willfully misrepresent someone else's work as your own, you are guilty of cheating. Cheating, in any form, will not be tolerated in this class.

If you are guilty of cheating on any assignment, quiz, exam, or project, you will be penalized the number of points that the assignment is worth. For example, if you cheat on an assignment worth 10% of the final grade, you will be receive -10% for that assignment. If you copy from someone else in the class both parties will be penalized, regardless of which direction the information flowed. Two or more instances of cheating in the course will be referred to the School of Engineering Discipline Committee, and will result in an F in the class.

We will follow the guidelines of the University Undergraduate Academic Integrity Policy, but we reserve the right to make the final determination of what constitutes cheating for this class. If you suspect that you may be entering an ambiguous situation, it is your responsibility to clarify it before the professor or TAs detect it. If in doubt, please ask.


Page written by Bill Smart.