Description
In this homework, we will study the problem of finding the “median” of an array of n integers. The median of an array of integers is the middle element of the array if the numbers in the array are sorted. For example, consider the array A = [ 3, 13, 39, 24, 14, 12, 56, 23, 29 ]. If we put those numbers in order, we would have A’ = [ 3, 12, 13, 14, 23, 24, 29, 39, 56 ]. There are nine numbers. Our middle number will be the fifth number. The median of A is 23. If the size of the array is even, there are two medians, the “lower-median” and the “upper-median”. For example, assume
A = [ 3, 13, 39, 24, 14, 12, 56, 23, 29, 11 ]. Then, A’ = [ 3, 11, 12, 13, 14, 23, 24, 29, 39, 56 ]. There are now ten numbers so we do not have just one middle number, we have a pair of middle numbers: 14 and 23. 14 is the lower-median and 23 is the upper-median. For this homework you can assume that for inputs with even number of elements, the lower-median is the median.
Three alternative algorithms that find the median of an array are discussed below. Each algorithm has a different time complexity. The goal of this homework is to simulate the growth rates of all three algorithms using different inputs.
Algorithm 1 Finding the maximum number in an array can be done in O(n) time. We can simply find the maximum value in the array, eliminate it (move it to the beginning of the array), and continue doing this n/2 times. Hence, this algorithm runs in O(n2)-time. A pseudocode of this algorithm is given below:
1: procedure FIND MEDIAN 1(A, size)
2: count ← 0
3: while count < (size+1)/2 do
4: largest ← A[count]
5: indexOfLargest ← count
6: for i ← count to size-1 do
7: if A[i] > largest then
8: largest ← A[i]
9: indexOfLargest ← i
10: end if
11: end for
12: exchange A[count] ← A[indexOfLargest]
13: count ← count + 1
14: end while
15: return A[count-1]
16: end procedure
Algorithm 2 You can sort the input array in descending order, and select the middle element (that is the median) in the sorted array. The overall complexity of the algorithm will be dominated by the complexity of the sorting algorithm because median can be selected in O(1)-time once the array is sorted. You can use an efficient sorting algorithm (such as Quicksort (see http://en.wikipedia. org/wiki/Quicksort) that runs in O(nlogn) average time so that the average time complexity of the overall algorithm is also O(nlogn). A pseudocode of median finding algorithm utilizing Quicksort is given below:
1: procedure FIND MEDIAN 2(A, size)
2: QUICKSORT(A)
3: return A[(size-1)/2]
4: end procedure
Algorithm 3 There is a linear-time algorithm for finding the k’th largest number in an array. Read the section titled “Finding the k’th smallest value of an array” in Chapter 2 of the Carrano
book and the section titled “Partition-based selection” in the Wikipedia entry on selection algorithms (http://en.wikipedia.org/wiki/Selection_algorithm) for the description of an algorithm called Quickselect that has an average case time complexity of O(nlogn). Then, read the “Median of Medians algorithm” in the Wikipedia entry https://en.wikipedia.org/wiki/ Median_of_medians and the pseudocode and detailed analysis by Prof. David Eppstein at http: //www.ics.uci.edu/~eppstein/161/960130.html for a linear (O(n)) worst case time selection algorithm. Utilizing this selection algorithm, we can find the median by just selecting the (n/2)’th element of the array. A pseudocode of this algorithm is given below:
1: procedure FIND MEDIAN 3(A, size)
2: return SELECT(A, (size+1)/2) 3: end procedure
Assignment:
1. Study all of the algorithms above using the given references and understand how the upper boundsare found for the running time of each solution. Then, describe how the complexity of each algorithm is calculated in your own words (as a separate paragraph for each of the algorithms).
2. You are given the implementation of each algorithm (in files named findMedian.h and findMedian.cpp on the course web site) with the following prototypes
int FIND_MEDIAN_X( int input[], const int n );
where X is the algorithm number (X = 1,…,3). Each solution requires that the input array is allocated with size n, and the input array is filled with n integers by the calling function. The solution function finds and returns the median. Study all of these solutions.
3. Write a driver (main) function that generates an array that contains random numbers and calls each of these solutions with that array as input. Then, run each solution and record the execution times when different input sizes are used. You can vary n within a large range. You are expected to try many different input sizes, both small inputs and very large inputs (as large as around half million to observe the effects of different growth rates.
4. The time required for constructing the input array is not included in the complexity analysis. Makesure that you give exactly the same array as input to each solution. Note that the contents of the
5. Create a table containing all the values that you have recorded and include it in your report. In thistable, each row should correspond to a particular value of n and each column should correspond to one of the three algorithms.
6. Use these results to generate a plot of running time (y-axis) versus the input size n (x-axis). In particular, you are expected to produce a plot as in Figure 2.3 of the handout chapter on algorithm analysis. Make sure that your plots are of high quality and all details are clearly labeled.
7. Similarly, plot the expected growth rates obtained from the theoretical analysis (as given for eachalgorithm above) by using the same n values that you used in your simulations.
8. Compare the expected growth rates and the obtained results, and discuss your observations in aparagraph in your report.
9. Provide the specifications of the computer you used to obtain these execution times. Make surethat you report the CPU, RAM, and operating system specifications. You can use any computer with any operating system for this assignment.
You can use the following code segment to compute the execution time of a code block (by including the ctime header file):
//Store the starting time double duration; clock_t startTime = clock();
//Code block …
//Compute the number of seconds that passed since the starting time duration = 1000 * double( clock() – startTime ) / CLOCKS_PER_SEC; cout << “Execution took ” << duration << ” milliseconds.” << endl;
An alternative code segment is as follows (by including the chrono header file):
//Declare necessary variables std::chrono::time_point< std::chrono::system_clock > startTime; std::chrono::duration< double, milli > elapsedTime;
//Store the starting time startTime = std::chrono::system_clock::now();
//Code block …
//Compute the number of milliseconds that passed since the starting time elapsedTime = std::chrono::system_clock::now() – startTime; cout << “Execution took ” << elapsedTime.count() << ” milliseconds.” << endl;
Notes:
1. In this assignment, you MUST submit a report (as a pdf file) that contains all information requested above (description of the algorithms, table, plots, computer specification, discussion, etc.) and a cpp file that contains the main function that you used as the driver in your simulations in a single zip file. The name of this zip file should conform to the following name convention: secX-FirstnameLastname-StudentID.zip where X is your section number. You MUST NOT include the solution functions that we provided in your submission. The submissions that do not obey these rules will not be graded.
2. You should prepare your report using a word processor (in other words, do not submit images ofhandwritten answers) and convert it to a PDF file. Handwritten answers and drawings will not be accepted. In addition, DO NOT submit your report in any other format than PDF.
3. Make sure that each file that you submit (each and every file in the archive) contains your name, section, and student number at the top as comments.
Reviews
There are no reviews yet.