CS450: Introduction to Parallel Programming Assignment 7 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS450: Introduction to Parallel Programming
Assignment 7: Improving Application Performance Using OpenMP and Algorithmic Transformations
2022
Preliminaries
You are expected to do your own work on all homework assignments. You may (and are encouraged to) engage in discussions with your classmates regarding the assignments, but specific details of a solution, including the solution itself, must always be your own work. See the academic dishonesty policy in the course syllabus.
Submission Instructions
You should turn in an electronic archive (.zip, .tar., .tgz, etc.). The archive must contain a single top- level directory called CS450 aX NAME, where “NAME” is your NAU username and “X” is the assignment number (e.g., CS450 a7 ab1234). Inside that directory you should have all your code (no binaries and other compiled code) and requested files, named exactly as specified in the questions below. In the event that I cannot compile your code, you may (or may not) receive an e-mail from me shortly after the assignment deadline. This depends on the nature of the compilation errors. If you do not promptly reply to the e-mail then you may receive a 0 on some of the programming components of the assignment. Because I want to avoid compilation problems, it is crucial that you use the software described in Assignment 0. Assignments need to be turned in via BBLearn.
Turn in a single pdf document that outlines the results of each question. For instance, screenshots that show you achieved the desired program output and a brief text explanation. If you were not able to solve a problem, please provide a brief write up (and screenshots as appropriate) that describes what you tried and why you think it does not work (or why you think it should work). You must provide this brief write up for each programming question in the assignment.
This pdf should be independent of the source code archive, but feel free to include a copy in the top level of that archive as well.
Overview
A simple operation that is used in many applications is calculating distances between points. In this assignment, we will use the Euclidean distance. For example, if p1 = (x1 , y1 ) and p2 = (x2 , y2 ), then the distance is as follows: distance(p1 , p2 ) = ^(x1 . x2 )2 + (y1 . y2 )2 . In this assignment, you will calculate the distance between points, and report the total number of distances between points that are within some distance threshold, ∈ . Thus, you will count the total number of occurrences distance(p1 , p2 ) e ∈ . As ∈ increases, the total number of points within ∈ should increase. For validation instructions, see the end of the assignment.
Illustration
Consider Figure 1 with p1 = (2, 2), p2 = (3, 4), p3 = (5, 2), p4 = (.4, 4), p5 = (.2, .3), p6 = (2, .1). An example calculation of the distance between a few pairs of points (to a few decimal places) are as follows:
● p1 and p2 : ^(2 . 3)2 + (2 . 4)2 = ^5 = 2.236
● p2 and p3 : ^(3 . 5)2 + (4 . 2)2 = ^8 = 2.828
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
p |
4 |
|
|
|
|
|
p |
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
p |
1 |
|
p |
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
4 |
3 |
2 |
1 |
0 |
p |
2 6 |
3 |
4 |
5 |
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
p |
5 |
. |
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
Figure 1: Example. See text for details.
● p6 and p5 : ^(2 . (.2))2 + (.1 . (.3))2 = ^20 = 4.472 Consider all the points in Figure 1.
● If ∈ = 1 or ∈ = 2, then no pairs of points are within ∈ .
● If ∈ = 3: d(p1 , p2 ) e ∈ , d(p2 , p3 ) e ∈ , d(p1 , p6 ) e ∈ , d(p1 , p3 ) e ∈ .
● If ∈ = 100: all points are within ∈ of each other.
Details
The Euclidean distance is symmetric, i.e., distance(p1 , p2 ) = d(p2 , p1 ). You must “double count” these pairs. That is, if p1 and p2 are within ∈ then p2 and p1 are also within ∈, and both must be counted towards the total number of points within ∈ of each other. You must also add to the total count the distance between a point and itself. E.g., distance(p1 , p1 ) = 0 will always be within ∈ . These requirements make the total count easier to compute, as these are considered corner cases.
If ∈ = 100 in Figure 1, then all points are within ∈ of each other. Including the “double counting” above, and a point counting itself, this means that the total number of counts you should compute are: 62 = 36.
If ∈ = 3 (see example above), then the total count of points within ∈ of each other should be 14. All points are within ∈ of themselves (6), and the 4 pairs of points within ∈ above yield 4 <2=8.
Program Guidelines
Your program will take as input on the command line a value for ∈ (declared as a double precision float). Your program will output the total number of point comparisons that are within ∈ . I will provide you with starter code that passes ∈ in on the command line.
Table 1: Measurements and metrics for the subquestions in Question 1. All response times have been averaged over 3 time trials, each using separate executions of the program. The column “No Opt.” refers to executing the program without compiler optimization. Blank lines indicate values that you need to add to the table.
Subquestion (e = ) |
#Cores |
Time (s) No Opt. |
Time (s) w/ -O3 |
Speedup |
Parallel Efficiency |
Sequential |
1 |
|
|
1 |
1 |
(a) |
|
|
|
|
|
Sequential |
1 |
|
|
1 |
1 |
(b) |
|
|
|
|
|
Question 1: Baseline with OpenMP
●Write the brute-force sequential algorithm to compute the total number of points that are within e according to the guidelines above. This algorithm compares all points to each other, calculating the distance and determining if they are within e. Hint: the algorithm is O(N2), where N is the input size (total number of data points).
●Parallelize the sequential code with OpenMP. The number of threads you use should match with the number of threads or your computer/virtual machine.
● Compare the execution time of the sequential vs. parallel algorithm.
●For all items below, include the number of seconds it took to run the program. Report an average of 3 time measurements.
(a) Without compiler optimization, report the number of cores, speedup, and parallel efficiency achieved. (b) With compiler optimization (-O3), report the number of cores, speedup, and parallel efficiency achieved.
(c) For the items above, reason about your performance gains. How is your speedup or parallel efficiency? Explain.
Example compilation using optimization: g++ main .cpp -o main -O3
Example of running the program with the e = 10 as a parameter: ./main 10 .0
Submission:
I provide you with point_epsilon_starter .cpp. This code generates the data, gets e from the command line, and shows you where to time your code. Download it, rename it question1_NAME .cpp, complete the question, and turn in an archive with this file. Make sure that you include responses to Questions (a)– (c) above in your pdf writeup. In addition, fill out a table that looks like that shown in Table 1.
Question 2: Algorithmic Transformation
The above algorithm is O(N2), which is not very efficient. For instance, in Figure 1, if e = 3, then we know visually that there is no reason to compare p3 and p4 , because they are clearly far away from each other. Modify your brute force O(N2) algorithm to be more efficient (i.e., fewer total distance calculations, fewer floating point operations, or other ways of reducing the computational load of comparing points). Parallelizing a program to get better performance is good. But in the end, not computing something remains more efficient! This will require an algorithmic transformation, meaning that you cannot use the brute force algorithm in Question 1. You will need to be creative to reduce the total computational load. Use OpenMP to parallelize the code. Remember that the algorithm should still give the correct result for a given value of e.
Report the following information:
(a) A description of how your new algorithm works. That is, how does it reduce the computational load in comparison to the O(N2) algorithm? What overheads does it have? Overheads here refer to the
Table 2: Measurements and metrics for the subquestions in Question 2. All response times have been averaged over 3 time trials, each using separate executions of the program. “No Opt.” refers to executing the program without compiler optimization. Blank lines indicate values that you need to add to the table.
2022-07-28