Description
CSCI-GA 2572 Deep Learning
The goal of homework 1 is to help you understand how to update network parameters by the using backpropagation algorithm.
For part 1, you need to answer the questions with mathematics equations. You should put all your answers in a PDF file and we will not accept any scanned hand-written answers. It is recommended to use LATEX.
For part 2, you need to program with Python. It requires you to implement your own forward and backward pass without using autograd. You need to submit your mlp.py file for this part.
• theory.pdf
• mlp.py
The following behaviors will result in penalty of your final score:
1. 5% penalty for submitting your files without using the correct format. (including naming the zip file, PDF file or python file wrong, or adding extra files in the zip folder, like the testing scripts from part 2).
2. 20% penalty for late submission within the first 24 hours. We will not accept any late submission after the first 24 hours.
3. 20% penalty for code submission that cannot be executed using the steps we mentioned in part 2. So please test your code before submit it.
1 Theory (50pt)
To answer questions in this part, you need some basic knowledge of linear algebra and matrix calculus. Also, you need to follow the instructions:
1. Every vector is treated as column vector.
2. You need to use the numerator-layout notation for matrix calculus. Please refer to Wikipedia about the notation.
3. You are only allowed to use vector and matrix. You cannot use tensor in any of your answer.
4. Missing transpose are considered as wrong answer.
1.1 Two-Layer Neural Nets
You are given the following neural net architecture:
Linear1 → f →Linear2 → g
where Lineari(x) = W(i)x + b(i) is the i-th affine transformation, and f,g are element-wise nonlinear activation functions. When an input x ∈ Rn is fed to the network, yˆ ∈RK is obtained as the output.
1.2 Regression Task
We would like to perform regression task. We choose f(·) = (·)+ = ReLU(·) and g to be the identity function. To train this network, we choose MSE loss function `MSE(yˆ, y)=kyˆ − yk2, where y is the target output.
(a) Name and mathematically describe the 5 programming steps you would take to train this model with PyTorch using SGD on a single batch of data.
(b) For a single data point (x, y), write down all inputs and outputs for forward pass of each layer. You can only use variable x, y,W(1),b(1),W(2),b(2) in your answer. (note that Lineari(x)=W(i)x+b(i)).
Layer Input Output
Linear1
f
Linear2
g
Loss
(c) Write down the gradient calculated from the backward pass. You can only use the following variables: x, y,W(1),b(1),W(2),b(2), ∂∂`yˆ , ∂∂zz21, ∂ ∂zyˆ3 in your answer, where z1,z2,z3, yˆ are the outputs of Linear1, f,Linear2,g.
Parameter Gradient
W(1)
b(1)
W(2)
b(2)
(d) Show us the elements of ∂∂zz21, ∂∂zyˆ3 and (be careful about the dimensionality)?
1.3 Classification Task
We would like to perform multi-class classification task, so we set both f,g = σ,
=. + − 1.
the logistic sigmoid function σ(z) (1 exp( z))−
(a) If you want to train this network, what do you need to change in the equations of (b), (c) and (d), assuming we are using the same MSE loss function.
(b) Now you think you can do a better job by using a Binary Cross Entropy
do you need to change in the equations of (b), (c) and (d)?(BCE) loss function `BCE(yˆ, y)= K1 PKi=1−£yi log(yˆi)+(1−yi)log(1−yˆi)¤. What
(c) Things are getting better. You realize that not all intermediate hidden activations need to be binary (or soft version of binary). You decide to use f(·)=(·)+ but keep g as σ. Explain why this choice of f can be beneficial for
training a (deeper) network.
2 Implementation (50pt)
You need to implement the forward pass and backward pass for Linear, ReLU, Sigmoid, MSE loss, and BCE loss in the attached mlp.py file. We provide three example test cases test1.py, test2.py, test3.py. We will test your implementation with other hidden test cases, so please create your own test cases to make sure your implementation is correct.
Extra instructions:
1. Please use Python version ≥3.7 and PyTorch version 1.7.1. We recommend you to use Miniconda the manage your virtual environment.
2. We will put your mlp.py file under the same directory of the hidden test scripts and use the command python hiddenTestScriptName.py to check your implementation. So please make sure the file name is mlp.py and it can be executed with the example test scripts we provided.
3. You are not allowed to use PyTorch autograd functionality in your implementation.
4. Be careful about the dimensionality of the vector and matrix in PyTorch. It is not necessarily follow the the Math you got from part 1.
Reviews
There are no reviews yet.