Performance Programming Coursework 1


Introduction

The overall aim of the Performance Programming coursework is to take a serial application and improve its performance on the compute nodes of Cirrus. The coursework is split into two parts, with the first part focussed on optimising the application using the compiler, and the second part focussed on hand-optimising the code itself. This document outlines coursework 1, optimising the application using the compiler.

Both pieces of coursework are assessed through a written report detailing the work undertaken and performance achieved. Note that the target platform is Cirrus and its associated software stack. If you do not already have access to Cirrus please contact the course organiser.

We will be using a simple molecular dynamics, which simulates the movement of particles over time. The starting source code is available on Learn and called is MD_2021.tar.

There are both C and Fortran versions of the code available. You should select one of these versions for use in these courseworks, and work only on that version.


Running the program

As provided the program reads an initial state from the file input.dat and then performs 5 blocks of 100 timesteps writing an output file after each block. The output files are in the same format as the input file so you can use any output file as an input for a shorter performance test that performs less than 500 iterations. The code reports timing information for each block of 100 timesteps and for the loop over blocks that includes file access operations.


Checking correctness

Note that optimising the code may change the floating point results slightly, so a simple diff on output files is not a useful verification test. The subdirectory Test contains a C program which, when compiled, can be used to test that two output files from the MD code are the same to within an acceptable tolerance. The syntax for this is:

  diff-output file1 file2

This program will not detect the presence of NaN values in the input so you should test for these explicitly, either by extending the diff-output program, or creating a small program or script to check the output yourself.

In addition, very small numerical differences will be magnified over time, particularly once the particles start to collide, so the verification test is unlikely to pass for more than 200 time-steps from a common starting point. The verification test is intended as a guide rather than a definitive test of correctness so you need to give some thought to how you test for correctness. We suggest building tests using blocks of 100 iterations (timesteps) from a region of the simulation after the particles have started to collide.


Assignment

The assignment for coursework 1 is to produce a report (around 5 pages including figures) on optimising the application using the compiler activity. The report may contain additional appendices if you wish, though coursework 1 assessment will be based on the main report. The report should present the results of your work investigating and improving the performance of this code using the compiler only. The source code is provided with Makefiles for the C and Fortran versions of the code, but these Makefiles may not include the optimal compiler flags and options for this application. For coursework 1 your task is to investigate improving performance of the application on Cirrus using different compiler flags, and potential, different compilers, to attempt to get the best performance possible without altering the code itself.

The report should outline the compiler and compiler flags you have chosen, and the performance achieved with those compiler flags. This should be an iterative process, with you investigating the effect of different levels of compiler optimisation on the performance and correctness of the application. You should summarise which compiler flags you would suggest using for the application based on the experiments you have undertaken.

Normal performance optimisation procedure would be to start with profiling the application to obtain information about performance. However, for this coursework, where we are restricting ourselves to purely optimising through the compiler, you do not need to profile the application.

You should remember, that, as with all performance reports, you should also document the environment you are running your tests in (i.e. what hardware are you using, what compilers, etc…) and also make sure your results are reproducible by running any benchmarking multiple times. You can report whatever number you wish (average, minimum, maximum) providing you state what you did in your report and it is consistent. File I/O times do not need to be considered and can be omitted from timing results.

This coursework is marked on the report you submit, so the report should be a stand-alone document including discussions of the compiler flags chosen at the performance observed for different compiler flags. You may also experiment with different compilers on Cirrus to evaluate performance across compilers.


Marking scheme

The report will be marked on:

• Demonstrated understanding of the performance issues: both problems in the original code and of the results of changes made to the code (35).

• Discussion of the proposed optimisations: their impact on performance as well as code quality (35).

• Methodology used in the assignment as demonstrated in the report. This includes general approach, tools used etc. (20).

• Clarity, relevance and presentation of the report (10).

Coursework is due at 16.00 on Friday 12th March.

As per the University's Taught Assessment Regulations (for further information see link on Learn course Assessment page) assignments submitted after the deadline (unless granted an extension, see Student Support page on the Learn course) are subject to a 5% penalty per day (i.e. 24 hours) that the assignment is late after the deadline, up to a maximum of seven. Assignments handed in more than seven days late receive zero marks.