Description
DNN Model
● 2 fully-connected layers
● Recognize handwritten digits (0~9)
● Input dimension: 28×28
● Batch size: 60,000
DNN Model
● 3 types of calculation:
○ Single Layer (y = aX + b)
○ Sigmoid
○ Argmax
Task
● The sequential code requires about 30~40 seconds
● Please parallelize the code using OpenACC (GPU).
Template Files
● CPU sequential code: lab5.cpp
● Pre-trained model weights
● Makefile
Files are located at: /home/pp23/share/lab5/
Only my_nn() and functions invoked inside my_nn() can be modified.
Compile & Execute
● Load module (NVIDIA HPC SDK): module load nvhpc
● Compile: make
● Execute: srun –gres=gpu:1 ./lab5
● Judge: lab5-judge
The expected inference accuracy is 97.8183%
Compilation details
● nvc++ will print the loops that is compiled successfully for running on GPU.
● The file extension should be .cpp, do not modify it to .cu.
Otherwise, nvc++ would not compile it.
Reviews
There are no reviews yet.