Description
Lab 1: Introduction to CUDA
Submission:
[CS179 Lab n] FirstName LastName
Example:
[CS179 Lab 1] Loko Kung
================================================================================
Question 1: Common Errors (20 points)
================================================================================ This class will make heavy use of low-level C constructs and concepts, especially pointers and memory management.
As a “warm-up”, here are a few quick samples of code and their intended specifications. Each such piece of code is incorrect. Identify what is wrong with the code, and how it should be fixed.
(Many of these problems allude to common errors encountered while writing both GPU and CPU code.)
——————————————————————————–
1.1
——————————————————————————–
Creates an integer pointer, sets the value to which it points to 3, adds 2 to this value, and prints said value.
void test1() { int *a = 3; *a = *a + 2; printf(“%d “, *a);
}
——————————————————————————–
1.2
——————————————————————————–
Creates two integer pointers and sets the values to which they point to 2 and 3, respectively.
void test2() { int *a, b; a = (int *) malloc(sizeof (int)); b = (int *) malloc(sizeof (int));
if (!(a && b)) { printf(“Out of memory “);
exit(-1);
}
*a = 2;
*b = 3;
}
——————————————————————————–
1.3
——————————————————————————–
Allocates an array of 1000 integers, and for i = 0, …, 999, sets the i-th element to i.
void test3() { int i, *a = (int *) malloc(1000);
if (!a) {
printf(“Out of memory “);
exit(-1);
}
for (i = 0; i < 1000; i++)
*(i + a) = i;
}
——————————————————————————–
1.4
——————————————————————————–
Creates a two-dimensional array of size 3×100, and sets element (1,1) (counting from 0) to 5.
void test4() { int **a = (int **) malloc(3 * sizeof (int *)); a[1][1] = 5;
}
——————————————————————————–
1.5
——————————————————————————–
Sets the value pointed to by a to an input, checks if the value pointed to by a is 0, and prints a message if it is.
void test5() { int *a = (int *) malloc(sizeof (int)); scanf(“%d”, a);
if (!a)
printf(“Value is 0 “);
}
================================================================================
Question 2: Parallelization (30 points)
================================================================================
——————————————————————————–
2.1
——————————————————————————–
Given an input signal x[n], suppose we have two output signals y_1[n] and y_2[n], given by the difference equations:
y_1[n] = x[n – 1] + x[n] + x[n + 1] y_2[n] = y_2[n – 2] + y_2[n – 1] + x[n]
Which calculation do you expect will have an easier and faster implementation on the GPU, and why?
——————————————————————————–
2.2
——————————————————————————–
In class, we discussed how the exponential moving average (EMA), in comparison to the simple moving average (SMA), is much less suited for parallelization on the GPU.
Recall that the EMA is given by:
y[n] = c * x[n] + (1 – c) * y[n – 1]
Suppose that c is close to 1, and we only require an approximation to y[n]. How can we get this approximation in a way that is parallelizable? (Explain in words, optionally along with pseudocode or equations.)
Hint: If c is close to 1, then 1 – c is close to 0. If you expand the recurrence relation a bit, what happens to the contribution (to y[n]) of the terms y[n – k] as k increases?
================================================================================
Question 3: Small-Kernel Convolution (50 points)
================================================================================
Introduction:
——————
We can then find y[n] by computing the convolution, which we denote (*):
y[n] = (x (*) h)[n]
The goal is to GPU-accelerate this computation. Similar to how we handled the addition problem, we allocate and copy memory as appropriate, and we can use the strategies in Lecture 2 to divide indicies among our many threads.
To do:
——————
Complete the GPU-accelerated convolution by filling in the parts marked TODO in
blur.cu.
Assignment notes:
——————
The code given to you will run the ordinary CPU version of the convolution, and compare the GPU/CPU speedup and the correctness of the GPU output. The default
is currently set to convolve the starting signal with a Gaussian kernel.
There are two output binaries:
noaudio-blur: Generate the input signal x[n] randomly, with a size
specified in the arguments.
audio-blur: Read the input signal x[n] from an input audio file, and
write the signal y[n] as an output audio file.
The default ‘make all’ will make both binaries.
CUDA_PATH variable in the Makefile.
Because convolving with the Gaussian kernel acts as an imperfect low-pass filter, the output file (in audio mode) will have its higher frequencies attenuated. Try it out!
On haru (haru.caltech.edu), you should get a speedup of ~6-8x, using a reasonable choice of block size and #blocks (e.g. 512, 200).
Hints:
——————
– The CPU code exists already (in blur.cpp somewhere in the mess); use it as a guide! Recall that we often accelerate CPU code by replacing it with
“similar-looking” GPU code!
Technical details:
——————
– For UNIX development: To use audio mode, you’ll need to install libsndfile.
– On Ubuntu, you can use “sudo apt-get install libsndfile1-dev”.
Reviews
There are no reviews yet.