Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CI5250

Parallel Programming and Data engineering

Coursework Aim

This coursework is about the understanding of the theory of parallel processing and Data Engineering application in practice.

Submission details

Please answer ALL questions in full sentences. No bullet points. Submit all your answers in a single WORD or PDF document via Turnitin on the modules Canvas assignment pages.

There are 25 marks in total. This assignment counts for 25% of the total mark of this module.

You are expected to meet all submission deadlines as set. However, the University recognises that unforeseen circumstances may occur, and all deadlines have a 24h grace period’ within which a late submission will be accepted with no penalty. After that, any late work submitted within 5 days of the original deadline (i.e. within 4 days of the grace period) will be marked but capped at 40%. If you are ill or experience problems that prevent you from meeting the deadline that are outside of your control the University Mitigating Circumstances policy may apply. Please refer to MyKingston > Faculties > SEC for guidance on the Extensions and Mitigation Circumstances process, requests for mitigation to be taken into account must be submitted with evidence through OSIS.

Coursework brief

This assignment covers multi-threading, parallel programming and Data Engineering. You need to answer ALL questions. This is an individual piece of work. Please answer questions in full sentences, no bullet points, for question one and write full Python codes for the second question. Put your answers into a pdf document which should have your name and your K- number at the top. Submit your answer sheet as a pdfdocument via CANVAS.

Multi-threading and Thread Synchronisation

Question 1: Processing Scheduling using Multi-threading  (15 marks)

The amount of time taken to run multiple processes may be reduced by using a number of Central Processing Units (CPUs) cores in parallel. The diagram below (Figure 1) shows seven processes and the dependencies between them. Each process takes a total of 20 seconds to run, therefore when run in sequence on one CPU core the time taken to run all processes is 7 x20 = 140 seconds.

The table below shows the scheduling of the processes when using a single CPU core

Time

20s

40s

60s

80s

100s

120s

140s

CPU1

P1

P2

P3

P4

P5

P6

P7

a)   Please complete the following table showing the most efficient scheduling for 2 CPU cores. Remember to take into account the dependencies between the processes shown in Figure 1.

Time

CPU1

CPU2

 (5 Marks)

b)  Please complete the following table showing the most efficient scheduling for 4 CPU cores. Remember to take into account the dependencies between the processes shown in Figure 1.

Time

CPU1

CPU2

CPU3

CPU4

 (5 Marks)

c)   Please describe the importance of dependencies in the scheduling of parallel processes(5 Marks)

Approx. 100 words

 

Figure 1

Question 2: Data Structure  (10 marks)

This question focuses mainly on using Pythons Pandas and NumPy libraries.

DataFrame is a 2D data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns.

Each column in a DataFrame is a Series (a one-dimensional array of values with an index).

Consider the following movie data table below:

Name

Genre

Rating

Gross profit

Harry Potter

Fantasy

8.4

£240,000

Troy

History

7.9

£310,000

Rush hour

Comedy

7.5

£198,000

Punisher

Action

8.1

£300,000

Fast and Furious

Action

8.7

£430,000

a)   Create a data dictionary using the data provided from the above table and name it “Movie” .

Write your code here:

(3 marks)

b)  Convert the Movie” dictionary into a data frame and name it Movie_df”.

Write your code here:

(1 mark)

c)   Create an index column and name it Movie_ID” with the following values: 34, 54, 25, 67, 87.

Write your code here:

d)  Using NumPy library, create a 2D array, name it arr_ 1”, with the following values:

3          4          9          7          6          1          4          7          4

Write your code here:

(2 marks)

e)   Sort the “arr_ 1” array.

Write your code here:

(1 mark)

f)   Reverse sort the “arr_ 1” .

Write your code here:

(1 mark)

Academic Misconduct

Plagiarism is presenting somebody else’s work as your own. It is an offence to copy materials (even if it is a phrase or a sentence) from the Internet or other work and publications. You must write everything in your own words. Collusion is also an offence i.e. allowing another student to use your work even if it is just as a template! There is a heavy penalty for plagiarism and collusion which could see you receiving a ZERO mark and your subsequent academic record may be affected. Further details about plagiarism and referencing can be found at:

http://www.kingston.ac.uk/aboutkingstonuniversity/howtheuniversityworks/policiesandregulat ions/documents/Plagiarism_%20Student.pdf