CSE 517A: Machine Learning

Spring 2007: Homework 4

Due 5pm, April 16, 2007

Purpose

The purpose of this lab is to give you some experience with artificial neural networks and reinforcement learning. You should attempt at least 70 points worth of questions from this assignment. Anything extra you attempt will be considered to be extra credit. Question 4 is not nearly as bad as you think it is, but it will make more time than you expect to complete all of the computation (during which you can be doing something else). Start early! Remember, you only need to do 70 points worth of problems.

Assignment

  1. Design a two-input threshold unit that computes the boolean function "A and not B". Describe your representations, and show the weights in your unit. You should do this by hand, and not use any of the algorithms we talked about in class.

    Design a two-layer network of threshold units that computes "A xor B". Again, do it by hand. [10 points]

  2. Design a gradient descent learning rule for a single linear unit with an output, o, where
    Use the derivation of the update rule for a standard linear unit as a starting point. [10 points]
  3. Implement a linear unit, using the learning rule we talked about in class. Pick a target function (linear, with at least three input variables, and one output variable), and use it to generate training data. Train the linear unit on these data, and make a graph of the MSE against the number of training points. Repeat the process with noisy training data (i.e., add some zero-mean noise to the output of each training point), and report on what happens. You should try a variety of learning rates.

    For this assignment, you don't need to explicitly generate test and training files. You can simply generate a training (or testing) point and process it through the network. When you're generating the MSE, you probably want to use a large number (say 1,000) test points. [20 points]

  4. Use a backpropagation network to learn to recognize faces. For this question, we will be using some code and images from Tom Mitchell's web site. For this question, you should attempt steps 1 through 8 (inclusive) in section 2.1 of the assignment on this web page. For your convenience, we've got a compiled version of the code in ~cse517/public/hw4/code/facetrain, and the two data sets (large and small images) in ~cse517/public/hw4/data on the CEC linux machines. [30 points]
  5. Design a simple MDP that shows how the choice of the discount factor affects the final learned policy. For your MDP, write down the Q-values of each node as functions of the discount factor, and describe how the final policy changes as the value of the discount factor changes. Discuss what this means. [10 points]
  6. In class we talked about value-function approximation (VFA), where the table in Q-learning is replaced by a function approximator. For two different function approximators that you know, briefly discuss how they might be used for VFA, if they are appropriate, and any possible problems that using them might cause. Think about cost of training and prediction, storage requirements, and other practical considerations. You should choose two different function approximators; if you choose locally-weighted averaging and locally-weighted regression, you might not get many points. [10 points]
  7. Implement Q-learning for a simple two-dimensional gridworld, with a 10 by 10 grid. Follow the example given in class. Assume that the goal is in the upper right corner, and that you have four actions (up, down, left, right). Use a sparse reward scheme, with a reward for getting to the goal state, and penalties for hitting the walls. Use a discount factor of 0.99, and experiment with different learning rates.

    Run your code, and print out the initial policy, an intermediate policy, and the final policy. You can do this easily by writing out a 10 by 10 grid of single characters, representing policy actions. For example,

    rrrll
    rrull
    rrull
    ruuul
    ruull
    is a possible policy for a 5 by 5 world. Use monte-carlo sampling, and report on how many sample points you need to converge to the final policy. [40 points]

What to Hand In

Turn in a paper copy of your homework, to the hand-in bin. Also turn in an electronic copy of your code. Please, please, please save the trees. Don't include reams of output from a program, since we will not read it. Only include data that are useful and readable. If you really feel driven to let us see the verbose output of your code, send it in electronic format. That way, at least, we can search in it electronically.
Page written by Bill Smart.