Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Coursework: High-performance computing

Module: Introduction to HPC (PHYS 52015)

Term:  Michaelmas term, 2022

Submission Please submit zip file containing: part1.pdf, part1.c, part2.pdf, part2.c.

Deadlines Consult the MISCADA learning and teaching handbook for submission deadlines.

Plagiarism and collusion Students suspected of plagiarism, either of published work or work from unpublished sources, including the work of other students, or of collusion will be dealt with according to Computer Science and University guidelines.

Description

Throughout the course we have considered simple programming problems largely distinct from the typical day-to-day practice of scientific computing. In this assignment you will experience a sliver of that practice by inheriting a codebase it is your responsibility to make faster while maintaining correctness.

Reaction-diffusion systems

Consider the following partial differential equation (PDE), which is a variant of the FitzHugh- Nagumo model,

u˙ = 2 u + u(1 u)(u b) v, · u = 0

v˙ = d∇2 v + c(au − v), · ∇v = 0

∥w∥2(2)(t) =工 w2 (t, xi , yj ),    for w ∈ {u, v}

i,j

This is an example of a reaction-diffusion system — the reaction is the second term on the right- hand-side, and the diffusion is the first term on the right-hand-side. These models produce a wide array of patterns, from cardiac arrhythmias to animal coat patterns. See here for an interactive example of this particular model.

You will be supplied with a two-dimensional reaction-diffusion code serial .c which additionally tracks the solution norms over time, using the third equation for ∥w∥2(2)(t). Your assignment will be

to parallelize the code using OpenMP and MPI, and to explain your decisions with theoretically sound arguments and measurements of performance.

A note about correctness: For the purposes of this assignment you should preserve the model

parameter values a, b, c, and d in the code serial .c. The relationships between the time-step dt, space-step dx, and the diffusion scaling parameter DD are are likewise very important for generating a coherent pattern these should not be modified.  Likewise, to ensure you are making fair comparisons, M should not vary run-to-run, and should be large enough to ensure you can measure the parallel performance and not just the start-up time; the default value should be good. If you modify any of the const parameters from the serial implementation in your work, you must say so explicitly in your write ups and it must be consistent across your implementations.

Additionally, the boundary conditions are folded into the evaluation of the diffusion term when i+/-1 or j+/-1 exceeds the range of u or v, then the code just mirrors these‘ghost points’across the boundary back into the domain, e.g., u[-1]  =  u[0]. (See our exercise on the heat equation for a reference.) Your implementation should retain this behavior.

Finally, I put explicit compilation commands that your code should satisfy in the following sections.

Your code will be marked on Hamilton 8, so if you develop your programs on your own computer,

then you should test that it works on Hamilton, with modules gcc and openmpi loaded. For scaling results you should measure the executable time, rather than the time for any subsection of the program, e.g. using the unix command time.

Part 1: OpenMP

In this assessment, you will compile and run a serial two-dimensional reaction-diffusion code, and compare its performance against a parallelized version that you will write. The serial code is made of five functions, init, dxdt, step, norm, and main. The expectations for your implementation are to use OpenMP #pragmas to:

•  Parallelise the function init.

•  Parallelise the function dxdt.

•  Parallelise the function step.

•  Parallelise the function norm.

Your code should be in a single C file named part1 .c. Your code should compile with the command gcc  -fopenmp  part1 .c  -lm  -o  omp_rd and, when run with ./omp_rd, produce the same (up to the vagaries of floating point precision) outputs as the serial code.

You will then time your parallel version to investigate the strong scaling of your implementation, and make recommendations for an optimal level of threading based on problem size and performance metrics. Your write up should be no more than one (1) page, in a file named part1 .pdf.

Questions to consider: What options for parallelisation are available? Why are some more suitable than others?  What difficulties arise in the parallelisation?  Where are the necessary synchronisation points? The solution norm requires the generation of a single output number from an N-by-N array; what patterns are available for this function? How did you avoid data races in your solution? Is your parallelisation approach the best option? What alternative techniques could be used?

Hint: The serial code does not have parallel data races, by definition, and avoids the more subtle data races of in-place updates by only reading from or writing to the state variables in each function. For the simplest parallelisation of the space loops, you will have a data race due to the diffusion term. When parallelizing the computation of the solution norms, you should consider how best to reduce any intermediate calculations. (Optional) You may find it beneficial to refactor the code to separate the reaction and diffusion contributions, or otherwise split the update, to avoid thread contention when accessing the same memory.

Part 2: MPI

In this assessment, return to the serial implementation of the two-dimensional reaction diffusion

system, and parallelize the code using MPI calls, breaking down the domain into distributed regions. Your implementation should:

•  Reproduce the correct initialization of u and v.

•  Use peer-to-peer exchanges for the border indices of u and v across processes.

•  Correctly calculate the norms of u and v across all ranks.

•  Correctly evaluate the diffusion term on all ranks.

Your code should be a single C file called part2 .c. Your code should compile with the command mpicc  part2 .c  -lm  -o  part2, and run correctly with mpirun  -np  4. Note: your MPI code

should run correctly with 4 processes.

Justify your parallelisation scheme theoretically, in terms of data passed between processes and in

terms of theoretical weak scaling of the implementation. Use the parallel fraction estimated in Part 1 and timing results with 4 processes to estimate the weak scaling. (Optional): Time your implementation with 1, 2, and 8 processes to better justify your estimates for weak scaling. Your write up should be no more than one (1) page in a file called part2 .pdf.

Questions to consider: What topologies for distribution are available with 4 MPI processes? Why might some be preferred over others?  What difficulties arise in the parallelisation?  The solution norm requires the generation of a single output number from an large distributed array — what patterns are available for this problem? What if we assume that u and/or v change slowly compared to the time-step do any further optimizations for data exchanged become available? (Optional): What are some constraints on the possible domain sizes and number of MPI processes for your solution? Can you parameterize the domain size and number of MPI processes in order to measure the actual weak scaling performance?

Hint: Since you are free to restrict each MPI process to execute local commands serially, the actual challenge of this assignment is to correctly distribute and share the spatial information, as distinct from your OpenMP implementation. In particular, note that the size of the local u and v arrays now depend on the number of MPI processes, and the indices used for the traversal of u and v will change accordingly you will need to account for that.

Submission format

Your submission should be a single zip file,  containing  part1 .c,  part2 .c,  part1 .pdf,  and part2 .pdf.

Grading

Artifact

Descriptor

Marks

All code

Compiles with no errors or memory leaks

10

Part 1 code

Correct implementation of parallel

reaction-diffusion model using

OpenMP

30

Part 1 write up

Description and justification of your parallelisation scheme, with              benchmarks for strong scaling, and  estimation of the parallel fraction

20

Part 2 code

Correct implementation of the parallel

reaction-diffusion model using MPI

20

Part 2 write up

Description and justification of your  parallelisation scheme, with estimates for weak scaling, based on the            estimation of the parallel fraction

20

Appendix

Generic coursework remarks

Stick exactly to the submission format as specified.  If you alter the format (submit an archive instead of plain files, use Word documents rather than PDFs, . . . ), the marker may refuse to mark

the whole submission. Markers will not ask for missing files. If you have to submit code, ensure that this code does compile and, unless specified otherwise, does not require any manual interaction.

Notably, markers will not debug your code, change parameters, or assess lines that are commented

out.

All of MISCADA’s deadlines are hard deadlines:  In accordance with University procedures, submissions that are up to 5 working days late will be subject to a cap of the module pass mark. Later submissions will receive a mark of zero. If you require an extension, please submit an official extension request including medical evidence and/or acknowledgement by college. Do not contact the lecturers directly, as lecturers are not entitled to grant extensions. Details on extensions and valid reasons to grant extended deadlines can be found in the Learning and Teaching Handbook.

It is the responsibility of the student to ensure that there are sufficient backups of their work and that coursework is submitted with sufficient slack. Submit your coursework ahead of time. If in doubt, submit early versions. Technical difficulties (slow internet connection around submission deadline, lost computer hardware, accidentially deleted files, . . . ) will not be mitigated. Please see

https://www.dur.ac.uk/learningandteaching.handbook/6/2/6/ for further information regarding illness and adverse circumstances affecting your academic performance.

If collusion or plagiarism are detected, both students who copy and students who help to copy can

be penalised. Do not share any coursework with other students, do not assist other students, cite all used sources incl. figures, code snippets, equations, . . . Please see https://www.dur.ac.uk/learning andteaching.handbook/6/2/4 and https://www.dur.ac.uk/learningandteaching.handbook/6/2/4/1 for further information.

Generic report quality criteria

Where summative coursework is assessed through written work in the form of a report, the report will be assessed against some generic criteria.

The relevant grade bands (as percentages) are

0–49 50–59 60–69 70–79

80– 100

Pass

Merit

Distinction

Outstanding

A fail-level report displays an unsatisfactory knowledge and understanding of the topic. The setup and evaluation of any experimental studies is incomplete. It contains many omissions or factual inaccuracies. Limited in scope and shows little or no evidence of critical thinking and application of the course material to the problem. No recognition of limitations of the approach or evaluation.

Experimental data are generally presented incorrectly, or without clarity.

A pass-level report displays some knowledge and understanding of the topic.  The setup and evaluation of any experimental studies is competent.  May contain some omissions or factual inaccuracies. Evidence of critical thinking and application of the course material to the problem occurs in some places. Has some recognition of limitations of the approach or evaluation. Most experimental data are presented correctly and clearly.

A merit-level report displays good knowledge and understanding of the topic as presented in the course material.  The setup and evaluation of any experimental studies is careful and detailed. Broadly complete in scope, with few or no errors. Evidence of critical thinking and application of the course material to the problem is mostly clear throughout. Recognises limitations of the approach or evaluation, and has some discussion on how to overcome them. Experimental data are presented correctly and clearly.

A distinction-level report displays effectively complete knowledge and understanding of the topic. The setup and evaluation of any experimental studies is well-motivated and near-flawless. Effectively no errors. Evidence of critical thinking and application of the course material to the problem is clear throughout, and some of the discussion goes beyond the taught material. Recognises limitations of the approach or evaluation, and presents plausible approaches to overcome them. Experimental data are presented carefully and with attention to detail throughout.

An outstanding-level report is similar to a distinction-level report but is effectively flawless through- out, and shows a significant independent intellectual contribution going beyond the taught material.