## Description

Machine Learning Course

Goals. The goal of this exercise is to

• introduce you to the PyTorch platform.

• discuss the vanishing gradient problem Problem 1 (PyTorch Getting Started):

Tutorials. Installation instructions:

EPFL

Martin Jaggi & Ru¨diger Urbanke mlo.epfl.ch/page-157255-en-html/ epfmlcourse@gmail.com

pytorch.org

We recommend using the following online tutorial:

pytorch.org/tutorials/beginner/pytorch with examples.html

Setup, data, and sample code. Obtain the folder labs/ex12 of the course github repository

github.com/epfml/ML course

Exercise 1: Torch Familiarize yourself with the basics of pytorch through the tutorial.

Exercise 2: Basic Linear Regression

• Implement prediction and loss computation for linear regression in the MyLinearRegression class.

• Implement the gradient descent steps in the train function.

• HINT: don’t forget to clear the gradients computed at previous steps.

Your output should be similar to that of Fig. 1, left.

Exercise 3: NN package

• Re-Implement Linear Regression using the routines from the nn package for defining parameters and loss in the NNLinearRegression class. Does the result that you obtain differ from the previous one? If so, why?

• Combine two linear layers and a non-linearity (sigmoid or ReLU) layer to build a Multi-Layer Perceptron (MLP) with one hidden layer, in the MLP class. Find the optimal hyper-parameters for training it.

Your prediction using the MLP should be non-linear, and for a hidden size of 2 might look like Fig. 1, right.

(a) Prediction with linear regression (b) Prediction with MLP

Figure 1: Predictions made by various trained models.

Problem 2 (Vanishing Gradient Problem):

Over the past few years it has become more and more popular to use “deep” neural networks, i.e., networks with a large number of layers, sometimes counting in the hundreds. Emperically such networks perform very well but they pose new challenges, in particular in the training phase. One of the most challenging aspect is what is called the “vanishing gradient” problem. This refers to the fact that the gradient with respect to the parameters of the network tends to zero typically exponentially fast in the number of layers. This is a simple consequence of the chain rule which says that for a composite function f(x) = gn(gn−1(···g1(x)···)) the derivative has the form

.

The aim of this exercise is to explore this problem.

Consider a neural net as introduced in the course with L layers, K = 3, and the sigmoid function as activation layer. Assume that all weights are bounded, lets say . Consider a regression task where the output layer has a single node representing a simple linear function with some bounded weights ci, lets say |ci| ≤ 1. Hence the overall function, call it f is a scalar function on RD. Show that the derivative of this function with respect to the weight vanishes exponentially fast as a function of L at a rate of at least .

2

## Reviews

There are no reviews yet.