Description
Background
In this homework, we still focus on MNIST digit classification problem. After implementing the details about MLP and CNN, you might know more about them. In fact, similar neural networks with standard structures are often encapsulated as modules in deep learning frameworks. It’s convenient and also important for us to master the skills to use these modules and construct our models with deep learning frameworks.
You will be permitted to use frameworks such as TensorFlow. We hope that you can understand the characteristics of them (e.g. data flow graphs in TensorFlow) and implement MLP and CNN to finish the task of MNIST digit classification.
In addition, you should implement 2 another techniques.
Dropout, which aims to prevent overfitting. During training process, individual nodes are either “dropped out” of the net with probability or kept with probability , so that a reduced network is left, and incoming and outgoing edges to a dropped-out node are also removed. When testing, we would ideally like to find a sample average of all possible dropped-out networks; unfortunately this is unfeasible for large values of . However, we can find an approximation by using the full network with each node’s output weighted by a factor , so the expected value of the output of any node is the same as in the training stage.
In this code, we implement dropout in an alternative way. During the training process, we scale the remaining network nodes’ output by . At testing time, we do nothing in the dropout layer. It’s easy to find that this method has similiar results to original dropout.
Batch normalization, which aims to deal with the internal covariate shift problem. Specifically, during training process, the distribution of each layer’s inputs will change when the parameters of the previous layers change. Researchers proposed to do batch normalization of the input to activation function of each neuron, so that the input of each mini-batch has a mean of 0 and a variance of 1. To normalize a value across a mini-batch,
where and denote the mean and standard deviation of the mini-batch. is a small constant to avoid dividing by zero. The transform above might limit the representation ability of the layer, thus we extend it to the following form:
where and are learnable parameters. For instance, the output of the hidden layer in MLP is
After we normalize the input to activation function , the output can be represented as
Hint:
1. Batch normalization of CNN should obey the convolutional property, thus different units in the same feature map should be normalized in the same way. Refer to Reference[2].
You are allowed to use tf.layers , tf.nn or other classes in TensorFlow.
Requirements
Python 2 or 3
TensorFlow >= 1.1
Dataset Description
Utilize load_data.py to read the training set and test set. During your training process, information about testing samples in any form should never be introduced. Note that the shapes of data are different in MLP and CNN.
MLP: To load data, use load_mnist_2d() in load_data.py . CNN: To load data, use load_mnist_4d().
Python Files Description
In this homework, we provide unfinished implementation of MLP and CNN in TensorFlow framework. Both programs share the same code structure:
main.py : the main script for running the whole program.
model.py : the main script for model implementation, including some utility functions.
load_data.py : functions for data loading.
MLP:
You are supposed to:
1. Implement “input — Linear – BN – ReLU – Dropout – Linear – loss” network in __init__() in model.py .
2. Implement batch_normalization_layer() and dropout_layer() functions in model.py , and use them in 1.
CNN:
You are supposed to:
1. Implement “input – Conv – BN – ReLU – Dropout – MaxPool – Conv – BN – ReLU – Dropout – MaxPool – Linear – loss” network in __init__() in model.py .
2. Implement the batch_normalization_layer() and dropout_layer() functions in model.py and use them in constructing conv layers and fully-connected layers in 1.
Report
In the experiment report, you need to answer the following basic questions:
1. Write down how you fill the arguments of model.forward . Explain how is_train and reuse work. Why should training and testing be different?
2. Plot the loss value (both training loss and validation loss) against to every iteration during training.
3. Construct the multi-layer perceptron and convolutional neural networks with batch normalization and dropout. Compare the differences between the results of MLP and CNN.
4. Construct MLP and CNN without batch normalization, and discuss the effects of batch normalization.
5. Tune the drop rate, and discuss the effects of dropout.
6. Explain why training loss and validation loss are different. How does the difference help you tuning hyper-parameters?
Attention: On your final submission, you need to submit MLP and CNN codes with BN and dropout.
Submission Guideline
You need to submit both report and codes, which are:
Report: well formatted and readable summary including your results, discussions and ideas. Source codes should not be included in report writing. Only some essential lines of codes are permitted for explaining complicated thoughts.
You should submit a .zip file name after your student number, organized as below:
Huang Fei (黄斐), huangfei382@163.com
Reference
[1] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation offeature detectors. arXiv preprint arXiv:1207.0580, 2012.
[2] Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing InternalCovariateShift, In Proceedings of the International Conference on Machine Learning, 2015: 448-456.
Reviews
There are no reviews yet.