Description
Martin Jaggi & Ru¨diger Urbanke mlo.epfl.ch/page-157255-en-html/ epfmlcourse@gmail.com
Goals. The goal of this exercise is to
• Better understand neural network
• Implement the feed-forward function and backpropagation in a simple neural net.
Setup, data and sample code. Obtain the folder labs/ex13 of the course github repository
github.com/epfml/ML course
In the following problems, we will use a very simple neural network. Let’s assume we have a three-layer neural net with one input layer of size D = 4, L = 1 hidden layers of size K = 5, and one output layer of size 1, as shown in Figure 1.
Input layer Hidden layer Output layer
Problem 1 (Feed-forward in neural networks):
In our simplified neural network, we have the feed-forward function shown below:
(1)
. (2)
Use Equation 1 and Equation 2 to fill in the corresponding template function in the notebook, and pass the test. For simplicity, in the following questions, let the bias term be 0 and use the Sigmoid as the activation function φ(·).
Problem 2 (Backpropagation in neural network):
Assume that we use the squared error as our loss function, as shown in Equation 3:
, (3)
where we have only one sample in our case, and y is the true value while yˆ is the network prediction.
Evaluate the derivative of L(w) with respect to weights and wi,j(1), and implement the corresponding function in the notebook.
Problem 3 (Effect of regularization):
What is the effect of regularization on the weights? To get some insight, let Θ be the vector of all weights in the neural network. Recall that we do not penalize the bias terms. Therefore, let us ignore them in the following. Let Θ∗ be a parameter that minimizes the cost function L for the given test set (where the cost function does not include the regularization). We would like to study how the optimal weight changes if we include some regularization.
In order to make the problem tractable, assume that L(Θ) can be locally expanded around the optimal parameter Θ∗ in the form
,
where H is the Hessian whose components are the entries
.
Now add a regularization term of the form .
1. Show that the optimum weight vector for the regularized problem is given by
Q(Λ+ µI)−1ΛQ>Θ∗,
where H = QΛQ> is the SVD of the symmetric matrix H, Q is an orthonormal matrix, and Λ is a diagonal matrix whose entries are non-negative and decreasing along the diagonal.
2. Show that (Λ+ µI)−1Λ is again a diagonal matrix whose i-th entry is now λi/(λi + µ).
3. Argue that along the dimensions of the eigenvectors of H that correspond to large eigenvalues λi essentially no changes occur in the weight, but that along the dimensions of eigenvectors of very small eigenvalues the weight is drastically decreased.
2
Reviews
There are no reviews yet.