CS542200 – Solved

$ 20.99


❖ Platform guide
❖ Tools
❖ Assignment

❖ Host: hades.cs.nthu.edu.tw
❖ Account: same as apollo
❖ Password: same as apollo
GPU Server with
GTX1080 x 2
Frontend Server
Client without GPU cards (Your computer) (hades01)
ssh hades.cs.nthu.edu.tw

Job Scheduler
❖ Course partition: pp23

❖ Limitation
➢ 2 gpus
➢ 5 minutes
Instructions to compile a CUDA program
➢ Compile
➢ nvcc [options] <inputfile>
➢ e.g., nvcc cuda_code.cu -o cuda_executable
➢ If you have a Makefile, simply
➢ make
Instructions to run a CUDA program
❖ hades02
➢ ssh hades02
➢ If you want to specify which GPU to use.
➢ export CUDA_VISIBLE_DEVICES=<gpu id>
➢ eg. export CUDA_VISIBLE_DEVICES=1 ➢ eg. export CUDA_VISIBLE_DEVICES=0,1
❖ hades[03-07]
➢ Slurm
➢ Access gpus with flag –gres=gpu:<number of gpu>
➢ eg. srun -n 1 –gres=gpu:1 ./executable
➢ eg. srun -n 1 –gres=gpu:2 ./executable
➢ If Two GPUs are requested, they will be on the same node.

NCHC Container
❖ Webpage: https://portal.apps.edu-cloud.nchc.org.tw
❖ Tutorial: https://hackmd.io/@enmingw32/pp-nchc
❖ Register your account first
❖ GPU: RTX 3070
❖ Total available GPUs: 36
❖ Please stop your container if you aren’t using it


Start Container
Access the Container

User & Password
● User: root
● Password: student
○ The password of ssh and code-server is the same

First-time Setup Script
Open terminal in the container:
bash <(curl -s https://apollo.cs.nthu.edu.tw/pp23/setup-remote.sh)
The script will execute the following commands:
● Set proper bash config for homework and lab judger (e.g., hw3-2-judge)
● Generate ssh key and install it on Apollo (you will be prompted to enter your Apollo account name and password)
Run this script only once, even you relaunched your container (since your personal data will be kept).

Stop your container
❖ Your files located under $HOME (/root/) will be preserved
Educational use only
Please cherish the computing resources we provided
❖ In this practice, try to run the deviceQuery
❖ Steps:
Hades ➢ cp -r /home/pp23/share/lab3/deviceQuery $HOME
NCHC ➢ cp -r /tmp/dataset-nthu-pp23/pp23/share/lab3/deviceQuery $HOME
➢ cd $HOME/deviceQuery
➢ nvcc deviceQuery.cpp -o deviceQuery
❖ Run it with
➢ hades02 or NCHC container
➢ SLURM scheduler
❖ How many CUDA cores on this machine?

❖ NVIDIA System Management Interface program
❖ You can query details about
➢ gpu type
➢ gpu utilization
➢ memory usage
➢ temperature
➢ clock rate ➢ …
nvidia-smi example

❖ Error types
➢ cuda-memcheck

❖ cuda-gdb tutorial
nvprof & nsight-compute
❖ They are CUDA profilers provide you feedback about how to optimize CUDA programs
➢ nvprof ./lab3 in.png out.png
➢ -o <FILE> to save result to a file
➢ -i <FILE> to read result from a file
❖ hades: nvprof (use the “prof” partition)
❖ NCHC: nsight-compute (command: ncu)
❖ nvvp-tutorial
❖ GUI version of nvprof
❖ Useful for the stream optimization
➢ Timeline

nvvp is useful for checking the concurrency of stream

Problem Description
❖ Edge Detection: Identifying points in a digital image at which the image brightness changes sharply

Sobel Operator
❖ Used in image processing and computer vision, particularly within edge detection algorithms.
❖ Uses two 3×3 filter matrix gx, gy which are convolved with the original image to calculate approximations of the derivatives – one for horizontal changes, and one for vertical. ❖ In this lab, we use 5×5 kernels

Convolution Calculation
❖ Iterate through the width and height of the image
❖ For each pixel, multiply the filter matrix with original image element-wisely and sum them up.

Credit: https://soubhihadri.medium.com/image-processing-best-practices-c-part-2-c0988b2d3e0c
Sample Result

❖ Please do not copy the testcases
❖ lab3.cu is cpu version(you need to rewrite it with cuda!)
❖ Follow hints if you have no idea about how to rewrite
How to run
❖ hades02
➢ ./lab3 <input> <output>
➢ CUDA_VISIBLE_DEVICES=0 ./lab3 <input> <output> ❖ hades[03-07]
➢ srun -n 1 –gres=gpu:1 ./lab3 <input> <output>
Check the correctness
❖ png-diff <result_file> <answer_file>
It verifies the correctness of your output result result_file is the output file from your CUDA program. answer_file is the provided file for correctness checking.
■ If your input_file is “/home/pp23/share/lab3/testcases/candy.png” , your answer_file is “/home/pp23/share/lab3/testcases/candy.out.png”

Your code is correct if you see “ok, 100.00%”
❖ Malloc memory on GPU
❖ Copy the original image to GPU
❖ Put filter matrix on device memory (or declare it on device)
❖ Copy filter matrix to shared memory
(don’t let only one thread do it)
❖ Parallelize the sobel computing
❖ Copy the results from device to host
❖ Free unused address
● Judge will execute your code with single process, single GPU ● Submit your code and Makefile (optional) to eeclass before 11/16 23:59
● Use lab3-judge(only available on hades)
● Get started as soon as possible to avoid heavy queueing delay
● Try to write your code on NCHC, and perform fine tune on Hades


There are no reviews yet.

Be the first to review “CS542200 – Solved”

Your email address will not be published. Required fields are marked *