Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

KIT308 Multicore Architecture and Programming

Assignment 1 Multithreading

Aims of the assignment

The purpose of this assignment is to give you experience at writing a multithreaded program for the CPU. This assignment will give you an opportunity to demonstrate your understanding of:

• creating and handling multiple threads;

• partitioning work;

• dynamic scheduling; and

• data-sharing between threads.

Due Date

11:55pm Friday 19th of August (Week 6 of semester)

• Late assignments will only be accepted in exceptional circumstances and provided that the proper procedures have been followed (see the School Office or this link for details) assignments which are submitted late without good reason will be subject to mark penalties if they are accepted at all (see the School office or this link for details on this as well).

• Forms to request extensions of time to submit assignments are available from the School of Engineering and ICT office. Requests must be accompanied by suitable documentation and should be submitted before the assignment due date.

Due to the tight schedule of this unit (and the reliance of future assignments on a solution to this one), extensions will be unable to be granted and late assignments will not be accepted. If exception circumstances occur for an individual student they must contact the lecture-in-charge before the assignment due date to arrange alternative methods of assessment in this unit.

Assignment Submission

Your assignment is to be submitted electronically via MyLO and should contain:

• A .zip (or .rar) containing a Visual Studio Solution containing a project for each attempted stage of the assignment (in the format provided in the downloadable materials provided below).

• A document containing:

o A table of timing information comparing the original single-threaded times against each of three stages on each scene file in the Drovided SDreadsheet.

。A written analysis of the above timing data.

You do not need to (and shouldn't) submit executables, temporary object files, or images. In particular; you must delete the ".vs" diretory before submission as it just Visual Studio temporary files and 100s of MBs. Do not however delete the "Scenes" folder or the "Outputs" folder (but do delete the images within this one).

Marking

This assignment will be marked out of 100. The following is the breakdown of marks:

Task/Topic

Marks

1. Basic Multithreaded CPU Implementation

30%

Correct (5%) and elegant (5%) thread management

(i.e. the correct image for any number of threads (even >64 threads)

10%

Correct use of "-thread" command-line argument

5%

Successful implementation of "-testMode" and "-colourise" command-line arguments

5%

Correct partitioning of render for images with height equally divisible by thread count

(i.e. the correct image, with exactly equal work area for each thread, and with the correct number of samples rendered for: Tests lf 2, 4, 5, and 7 with 8 threads)

5%

Correct partitioning of render for images with height NOT equally divisible by thread count

(i.e. the correct image, with work area allocated "evenly" across threads, and with the correct number of samples rendered for: Tests 3 and 6 with 8 threads, and also Test 1 with 3, 5, 6, 7, and 513 threads)

5%

2. Dynamically Balanced Line-Based Multithreaded CPU Implementation

20%

Correct (5%) and elegant (5%) thread management

10%

Correct (5%) and elegant (5%) data-sharing between threads

(i.e. no possibility of threads "fighting" over shared memory locations)

10%

3- Dynamically Balanced Square-Based Multithreaded CPU Implementation

30%

Correct and elegant thread management

5%

Correct and elegant data-sharing between threads

5%

Correct use of "-blockSize" command-line argument

5%

Correct partitioning of render into squares for images with size that is equally divisible by block size

(i.e. the correct image, with exactly equal work area for each block, rendered for: Tests 1, 2, 5, and 7 with block size 8 and 64)

5%

Correct partitioning of render into squares for images with size that is NOT equally divisible by block size

(i.e. the correct image, with work area for each block either at block size or smaller at edges, rendered for: Tests 3, 4, and 6 with block size 8 and 64)

5%

Correct partitioning of render with no extra samples generated

(i.e. the correct image, with work area for each block either at block size or smaller at edges, and with the correct number of samples rendered for: Tests 3, 4, and 6 with block size 8 and 64)

5%

Documentation

20%

Outputs showing timing information for each stage on all applicable scene files with all thread counts (to get full marks for this part your code needs to pass all tests for all stages above)

10%

Analysis of data with respect to CPU used

(to get full marks for this part your code needs to pass all tests for all stages above)

10%

Penalties

Failure to comply with submission instructions (eg. incorrect submission of files, submission of unnecessary temporary or image files, abnormal solution/project structure, etc.)

-10%

Poor programming style (eg. insufficient / poor comments, poor variable names, poor indenting, obfuscated code without documentation, compiler warnings, etc.)

up to -20%

up to -100%

L-OLCnC55 [ —U Aj iOi Up LU F1UU1 uU zu KOI Up LU / UC1 jSf -LUU /u Ui lcT / UoyS/

Late assignments will not be accepted

Failure to use the techniques outlined in this unit's lectures and tutorials, resulting in a wildly different solution, using un-taught libraries (eg. STL threading classes, threadgroups, etc.), un-taught techniques (eg. creating a list of points to render and sharing that list amongst the threads), or a vastly different rendering implementation

up to -100%


Programming Style

This assignment is not focussed on programming style, but you should endeavour to follow good programming practices. You should, for example:

• comment your code;

• use sensible variables names;

• use correct and consistent indenting; and

• internally document (with comments) any notable design decisions.

[NOTE: any examples in the provided assignment materials that don't live up to the above criteria, should be considered to be deliberate examples of what not to do and are provided to aid your learning ;P]

The Assignment Task

You are to implement a "simple" raymarching-style raytracer in a multithreaded fashion (for a quick definition of raytracing see the wikiDedia Dage).

From the provided single-threaded raytracer implementation, you will create multiple subsequent versions as follows:

1. A basic multithreaded CPU implementation.

2. A dynamically balanced line-based multithreaded CPU implementation.

3. A dynamically balanced square-based multithreaded CPU implementation.

Implementation

1. Basic Multithreaded CPU Implementation

This stage involves splitting up the render across multiple threads by splitting up the render into equal sized chunks (or almost equal sized chunks when the number of threads doesn't divide evenly into the image height) with each chunk being rendered entirely by an individual thread.

In order to complete this step you will need to:

• Write code to partition the rendering job into chunks (so that each thread generates a different part of the final image).

• Write code to create and manage multiple threads.

At the end of this stage the program should be able to handle any image size (any dimensions that are multiples of four; at least, up to the maximum of 2048x2048).

To be eligible for full marks, your assignment should also use the command-line option "-threads" to set the number of threads to any value that is less than or equal to the height of the image being rendered and also respect the "-colourise" option to tint each section of the image with a set of colours (colour re-use is expected for a large number of threads).

2. Dynamically Balanced Line-Based Multithreaded CPU Implementation

This stage involves splitting up the render into lines which are dynamically allocated to threads from the thread pool as they request more work.

In order to complete this step you will need to:

• Start as many threads as required (specified from the command-line arguments) for the thread pool.

• Manage some shared data between the threads so that each do the correct work.

3. Dynamically Balanced Square-Based Multithreaded CPU Implementation

This stage involves splitting up the render into squares which are dynamically allocated to thread from the thread pool as they request more work.

In order to complete this step you will need to:

• Partition the render into squares (requires more carefulness when writing results to the image).

NOTE: this implementation is (slighty) harder than Stage 2, but might not result in an increase in performance.

Hints/Tips

• The techniques required to complete each stage rely heavily on work done in the tutorials 2 and 3 — refer to them often.

Documentation

For each stage of the assignment you attempt you should provide:

• average (of 3 runs) timing information for each scene file for the total time taken for a render for particular thread counts. See the next section for an example format for this timing table that specifies which tests are required for which scenes; and

• an explanation of the results (e.g. why there's no difference between the performance of stages 1, 2, and 3 {NOTE: this is a made up example and isn't necessarily what to expect), or why a particular type of implementation works well (or poorly) on a particular scene, etc.). This explanation should be with respect to the CPU on the system on which you ran the tests, and you should discuss how the architectural features of the CPU explain the results.

Tests / Timing

The following table lists all the tests that your code needs to generate correctly at each stage. It also shows the timing tests that need to be performed in order to fully complete the documentation section of the assignment.

In order to confirm your images match the images created by the base version of the assignment code, it's strongly recommended you use a image comparison tool. For part of the marking for this. Image Magick will be used:

• Image Magick: download for Windows.

• e.g. running this from the command-line to test:

magick.exe compare -metric mae Outputs\all.txt_1024xl024xl_RaymapchepAssl.exe.bmp Outputs_REFERENCE\all.txt_1024xl024x4_RaymarcherAssl.exe・bmp diff.b This produces an image ("diff.bmp") showing the differences between the two source images (with differences shown as red

pixels), and also a numeric summary of the difference, here these images are 0.580199% different):

380.233(0.00580199)

(this example deliberately compares the lxl sampled to the 4x4 sampled image to show there is a difference).

Note: to fully complete the empty cells in the table below, around 200 total images will need be calculated (each test needs 3 runs, and the first five tests need to be run with the base code, and with 8 threads for each stage of the assignment, and test 6 needs to be run for the base code, 1-8 threads, and 16 and 32 threads for stage 1, stage 2, and stage 3 with two different block sizes), so plan your time accordingly.

Note: running the base code tinning tests could take around an hour depending on your CPU, so do that early on when you get the assignment!

Timing Test

1.

-input Scenes/all.txt - size 1024 1024 - samples 1

2.

-input Scenes/all.txt - size 1024 1024 - samples 4

3.

-input Scenes/all.txt - size 500 300 -samples 1

4.

-input

Seenes/cornell.txt -size

1024 1024 -samples 1

5.

-input

ScenesZ5000spheres.txt -size 240 135 -samples

1

6.

-input

Seenes/sponge.txt -size

128 128 -samples 1