-
Design a two-input threshold unit that computes the boolean function
"A and not B". Describe your representations, and show
the weights in your unit. You should do this by hand, and not use
any of the algorithms we talked about in class.
Design a two-layer network of threshold units that computes "A
xor B". Again, do it by hand. [10 points]
-
Design a gradient descent learning rule for a single linear unit
with an output, o, where
Use the derivation of the update rule for a standard linear unit as
a starting point. [10 points]
-
Implement a linear unit, using the learning rule we talked about in
class. Pick a target function (linear, with at least three input
variables, and one output variable), and use it to generate training
data. Train the linear unit on these data, and make a graph of the
MSE against the number of training points. Repeat the process with
noisy training data (i.e., add some zero-mean noise to the output of
each training point), and report on what happens. You should try a
variety of learning rates.
For this assignment, you don't need to explicitly generate test and
training files. You can simply generate a training (or testing)
point and process it through the network. When you're generating
the MSE, you probably want to use a large number (say 1,000) test
points. [20 points]
- Use a backpropagation network to learn to recognize faces.
For this question, we will be using some code and images from Tom
Mitchell's web site. For this question, you should attempt
steps 1 through 8 (inclusive) in section 2.1 of the assignment on
this web page. For your convenience, we've got a compiled version
of the code in
~cse517/public/hw4/code/facetrain, and
the two data sets (large and small images) in
~cse517/public/hw4/data on the CEC linux machines. [30
points]
-
Design a simple MDP that shows how the choice of the discount factor
affects the final learned policy. For your MDP, write down the
Q-values of each node as functions of the discount factor, and
describe how the final policy changes as the value of the discount
factor changes. Discuss what this means. [10 points]
-
In class we talked about value-function approximation (VFA), where
the table in Q-learning is replaced by a function approximator. For
two different function approximators that you know, briefly discuss
how they might be used for VFA, if they are appropriate, and any
possible problems that using them might cause. Think about cost of
training and prediction, storage requirements, and other practical
considerations. You should choose two different function
approximators; if you choose locally-weighted averaging and
locally-weighted regression, you might not get many points. [10
points]
-
Implement Q-learning for a simple two-dimensional gridworld, with a
10 by 10 grid. Follow the example given in class. Assume that the
goal is in the upper right corner, and that you have four actions
(up, down, left, right). Use a sparse reward scheme, with a reward
for getting to the goal state, and penalties for hitting the walls.
Use a discount factor of 0.99, and experiment with different
learning rates.
Run your code, and print out the initial policy, an intermediate
policy, and the final policy. You can do this easily by writing out
a 10 by 10 grid of single characters, representing policy actions.
For example,
rrrll
rrull
rrull
ruuul
ruull
is a possible policy for a 5 by 5 world. Use monte-carlo sampling,
and report on how many sample points you need to converge to the
final policy. [40 points]