闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

18-645: How to Write Fast Code

Assignment 4:- Shared Memory Parallelism

General Instructions.

– A PDF report and a zipped ﬁle is to be submitted via Canvas on the due date. Both ﬁles should be named asg 4 《your andrew id> with the approprate ﬁle extensions.

– The zipped ﬁle should contain the following

1. Source code (appropriately named) for all implementations;

2. A Makeﬁle that compiles all implementations on one of the machine on the ECE cluster;

3. Readme.txt that explains how to run your implementations. (Optional, but strongly recom- mended)

– Each program has to be named using the exact name found in parenthesis in the question.

1. OpenMP parallelism The source ﬁle (triangle.c) contains a function for computing the number of triangles in a graph stored in compressed-row-format. Parallelize the function using the three methods of parallelization in OpenMP

(a) Parallel For.(tri for.c) Parallelize the triangle counting function with the parallel for con-

struct.

(b) Parallel Region.(tri par.c) Parallelize the triangle counting function by creating parallel regions

in the function.

For all three implementations, provide plots of the execution time of your implementations for 1, 2, 4, 8, 16 and the total number of hardware threads available on the provided datasets.

Hint:

– All iterations of the outer loop can be done in parallel but a reduction of across iterations must be performed

– There is no reduction clause for task parallelism.

– The perfomances are not expected to be identical.

2. Eﬀects of scheduling Recall that there are diﬀerent scheduling mechanism that can be chosen with OpenMP pragmas. Update your parallel for implementation with the following scheduling schemes:

(a) Dynamic scheduling. (tri dyn.c)

(b) Chuck scheduling with chunk size of 1. (tri one.c)

(c) Chuck scheduling with chunk size of N/p.(tri Noverpc) N is the number of vertices in the graph, and p is the number of threads being used.

Provide plots of timings for for 1, 2, 4, 8 16, and the total number of hardware threads available on the provided datasets.

3. Compare the performance of the new implementations with the implementation (tri for.c) without explicit scheduling. Explain why you are seeing diﬀerences (if any) in the performance.

4. Miscellaneous. How many hours did it take you to complete this assignment?