Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ASSIGNMENT 2

CSC3021 Concurrent Programming 2021-2022

Question 1: Pumpkins at the Fair [10 marks]

Winnie the Witch has put up a tent at the local fair to display her pumpkins to visitors. Each pumpkin repeatedly becomes evil (we will not detail the horrors that occur when pumpkins are evil!), then returns back to normal. Each of the pumpkins is represented by threads that switch between these states, while each of the visitors and Winnie are represented by threads that repeatedly “enter” and “exit” the tent. All threads need to follow these rules:

· Visitors enter in groups of 3. No visitor may enter until all members of the previous group have left.

· Pumpkins may become evil only when there is precisely one visitor in the tent. At most one pumpkin may be evil at any time.

· Visitors may enter only when Winnie is already inside the tent, but Winnie may leave while visitors are present.

Write process definitions for the pumpkins, visitors and Winnie such that the above constraints are satisfied, and no additional constraints on concurrency are introduced. You may assume the following process structure:

Process Visitor[1..V]	Process Winnie	Process Pumpkin[1..P]
while(true){	while(true){	while(true){
//is entering	//is entering	//becoming evil
…	…	…
//has entered	//has entered	//is evil
…	…	…
//is leaving … //has left … } End Visitor	//is leaving … //has left … } End Winnie	//returning to normal … //is normal … } End Pumpkin

You may use semaphore variables and integer variables. You do not need to consider starvation issues but your code must be free of deadlock.

Ensure your hand-in contains pseudo-code that is clear to read and unambiguous.

Question 2: Graph Processing - Introduction [15 marks]

This question lays the foundation for the remaining assessments on the implementation of concurrent graph processing operations. In this assignment you will complete the implementation of a graph processing program. A skeleton program is available in a git repository. 1 You will extend the code by adding concurrency to it in Question 3. Before attempting that, you first need to thoroughly understand the sequential program. Following Kristen Nygaard’s “To program is to understand”, you will program the key steps yourself.

The code models graph analytics applications as a repeated graph traversal. Each traversal operates like a higher-order map method: all edges of the graph are visited and a user-supplied method is called for each edge. This method performs a function specific to the graph analytics. For instance, in PageRank it consists of a floating-point addition, while the label propagation algorithm performs a ‘minimum’ operation on integers. The actual operation performed is not of immediate interest to us. We will focus our attention on making the higher-order edgemap method execute in parallel. An accompanying document, available on canvas, gives additional information on the PageRank problem. It is useful to familiarise yourself with the PageRank problem for debugging and validation purposes.

The code may operate using one of three graph data structures (CSR, CSC or COO, defined in the slides). Implement the omitted code for calculating out- degrees (an auxiliary method required by PageRank) and for the edgemap method for each of the COO, CSR, and CSC sparse matrix formats. You only need to modify the files SparseMatrix{COO|CSR|CSC}.java. Keep the

command-line interface of the DriverA2.java program as is and keep the damping factor at 0.85. Test your code thoroughly as you will extend it in the next question. Confirm that all three versions calculate the same PageRank values, up to an error tolerance of 10-7.

Perform following experiments using your implementation and the orkut_undir2

graph and briefly answer these questions:

(i) Record the residual error (printed to the console by the PageRank operation) at the end of each iteration of the power method for each of the three matrix formats. Plot these values in a chart (consider log-scale). Summarise your observations.

(ii) Record the execution time taken by the PageRank algorithm for each of the three graph data structures (CSR, CSC and COO). Plot these in a

1 https://hpdc-gitlab.eeecs.qub.ac.uk/hvandierendonck/csc3021_assignments_2122 You cannot login to this git server; the repository is publicly accessible. You can clone this repository and work off it. Check the README.md file for instructions.

2 http://www.eeecs.qub.ac.uk/~H.Vandierendonck/CSC3021/graphs

chart. Can you explain why some formats result in lower execution time than others?

(iii) Modify your code to use single-precision floating-point numbers instead of double-precision floating-point numbers for one of the three sparse matrix formats by performing a search-replace operation (this affects the PageRank.java and Driver.java files). Record the residual error at the end of each power iteration step and the average execution time when using single-precision vs. double-precision floating-point numbers. Report these numbers using charts. Can you explain this result?

(iv) Identify the most time-consuming parts of the code. The graph processing code consists of three main steps: (1) reading in the file; (2) performing the graph analytics algorithm; (3) printing out results. We will further decompose step (3) in two parts: (3a) the time spent in the edgemap method, and (3b) the remainder of step (3). You will need to add additional timers to the code to collect all of these timings. Execute both your sequential implementation of the PageRank problem and the Connected Components problem on the orkut graph. Report timings in a chart. Which component takes the largest amount of time?

(v) Using Amdahl’s Law, and the data collected in (iv), predict the best-case speedup that can be obtained when executing the edgemap method on 4 threads.

Writeup: Report and discuss your measurements and present your predictions for speedup. Discuss the PageRank problem in all question parts, and also the Connected Components problem in parts (iv) and (v).

Question 3: Parallel Edgemap [10 marks]

Implement a parallel version of the edgemap method using the CSC sparse matrix format. To this end, you will partition the set of edges (split the set of edges in multiple, non-overlapping subsets such that each edge appears in exactly one of the subsets) and make each thread process a different partition. Modify the program to:

1. Create a configurable number of threads. The number of threads will be passed in as an additional argument to the program.

2. Create a version of the edgemap method for the CSC format with signature public void edgemap( Relax relax, int start, int end ), which will edgemap only over the vertices in the range start to end (end not inclusive).

3. Distribute the iterations of the outer loop of the edgemap method (approximately) equally over the threads by determining the start and end vertex IDs for each thread. Keep in mind that the number of vertices may not be a multiple of the number of threads. For instance, if there are 17 vertices and 3 threads, then a first thread may iterate over the vertices 0,…,5, the second thread may iterate over 6,…,11 and the final thread may iterate over 12,…,16.

4. Run each thread on their designated vertices.

5. Wait for all threads to complete by joining them.

To help you with these changes, the graph processing source code contains an abstract class ParallelContext that initiates the concurrent execution. The ParallelContext is set by the Driver at program startup and used by the PageRank and ConnectedComponents classes to parallelise the execution of edgemap. You have previously used the ParallelContextSingleThread which merely calls edgemap on the appropriate matrix. A place holder is provided for ParallelContextSimple where you extend the edgemap method to take the additional steps outlined above. All changes can be implemented within the ParallelContextSimple class, except for the selection of the context in the driver.3

Validate the correctness of your code by cross-checking the residual error for each iteration with the sequential code you obtained in Question 2. We will also release a validation program.

Writeup: Report the execution time for your concurrent solution when processing the orkut_undir graph and when using a varying number of threads. Select 1, 2, 3, … T+2 where T is the number of physical processing cores (including hyperthreads) of the computer that you are using for these measurements. Plot these measurements in a chart. Report the number of processor cores and hyperthreading arrangement on the computer that you are using. Discuss the achieved speedup you have obtained for PageRank and for Connected Components. Can you achieve the speedup predicted by Amdahl’s Law?

Input Files

You can find three graph data sets in CSR, CSC and COO format at http://www.eeecs.qub.ac.uk/~H.Vandierendonck/CSC3021/graphs

These graphs have varying sizes and will result in varying processing times. All of these can be comfortably processed in a few minutes on the lecturer’s laptop, which has a 1.8 GHz Intel Core i5 processor and 8GB main memory.

Each file begins with a one-line header stating the graph format: COO, CSC, CSR or CSC-CSR. The latter applies only to undirected graphs, which have the property that the CSC and CSR representations are equal. The next two lines then specify the number of vertices and number of edges. Then, the file formats diverge:

· The COO format contains one line per edge, each containing two integers: the ID of the source vertex followed by the ID of the destination vertex.

· The CSR format contains one line per vertex. The first integer on the line is the ID of the source vertex. Every subsequent ID on the same line specifies an edge from the first vertex to the listed vertex.

· The CSC format contains one line per vertex. The first integer on the line is the ID of the destination vertex. Every subsequent ID on the

3 You are allowed to make changes elsewhere in the code if you find that easier.

same line specifies an edge from the listed vertex pointing to the first vertex listed on that line.

The skeleton program already contains the required code to parse the files. Your task is to insert the edges in the data structures that you designed.

Submission

Your submission will consist of three parts: your completed code for Q2 and Q3 submitted to a git repository on gitlab2.eeecs.qub.ac.uk, the same code submitted to an online test server, and a write-up.

· Develop your code through a private git repository hosted on gitlab2.eeecs.qub.ac.uk to which you will provide access to the user [email protected]. When you add this user, I will be emailed about the details of the repository.

Make sure to make frequent commits to the repository as the commit history can inform the marking too in case your program is not fully functioning.

· Your code will be submitted through an online testing and evaluation system available at http://hvan.eeecs.qub.ac.uk/ using the contest ‘2122_pr’. This system will test the correctness of your code in a few sample situations by checking the correctness of the computed values. The relative ranking of your solution compared to others is irrelevant. You will be informed of your login credentials for this site in due time.

· The write-up is a PDF document containing your answers to Questions 1-3. Submit it through Canvas. Do not put your name, student ID or other identifying information in the write-up as it will be reviewed anonymously.

Constrain your hand-in to 5 sides of an A4 page. Estimated page requirements: Q1: 1 side; Q2: 2-3 sides; Q3: 1 side. The minimum font size should be 11 point and margins must be at least 2cm in all directions. Only a standard ‘Arial’ or other ‘Sans Serif’ font should be used. The font size of legends and labels on charts will need to be adjusted manually as neither Microsoft Word or Excel come with good default settings. Excess pages may be ignored during marking.

Guidance: Mark distribution:

Q1: 10 marks; Q2: 15 marks; Q3: 10 marks.