闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MA40177: Scientiﬁc Computing

Assignment 2

Handed out: Friday, April 8th 2022

Due in: Friday, May 6th 2022 (by 17. 00 at Assignment 2 - Submission Point on Moodle )

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

This assignment is worth 50% of the total assessment for MA40177 and should take an average student about 20 hours to complete (provided they have done all the problem sheet questions and tutorial exercises). Marks given in square brackets below indicate the marks available for each part. Please provide answers in the spaces below. You may word process your work if you wish, but please still insert it into the marked spaces. No marks will be lost if it is not word processed, provided it is legible. All programs should be written in FORTRAN95. All real arithmetic should be done in double precision (kind = 8). When writing/modifying

subroutines, use the exact order of parameters speciﬁed in the questions and use the given ﬁlenames.

You should not discuss the details of your work with anyone else. The work which you hand in must be your own. You should be prepared to explain anything which you write to an examiner if asked to do so. In particular, if it is discovered that all or part of your code has been copied, both parties involved risk a severe penalty and might lose all their marks on the assignment.

IMPORTANT: When you have ﬁnished the assignment, please put all the relevant ﬁles (i.e. those you copied over and those you wrote) into a directory with a distinct name that identiﬁes you as the author, e.g. assignment2_skl27. Zip up the directory into a single ﬁle using the command

tar czvf assignment2_skl27 .tgz assignment2_skl27

and then upload the tgz-ﬁle that you obtain and a PDF ﬁle of your written work on the course Moodle page at Assignment 2 - Submission Point.

1 Two-dimensional Gray-Scott equations

In the second assignment we consider a two-dimensional model of the nonlinear reaction- diﬀusion, focusing on aspects of parallel computations. As previously, we are looking for concentrations u, v of two substances U, V , but now they depend on two variables (x, y) e [0, 1]2 and satisfy the two-dimensional partial diﬀerential equations

= _uv2 + F (1 _ u) + Du∆u on x, y e [0, 1], t e [0, T] (1)

∂v(x, y, t) 2

∂t

u(1, y, t) = u(0, y, t), v(1, y, t) = v(0, y, t), u(x, 1, t) = u(x, 0, t), v(x, 1, t) = v(x, 0, t) u(x, y, 0) = u0 (x, y), v(x, y, 0) = v0 (x, y).

(Boundary conditions) (Initial conditions)

(3)

(4)

Here, ∆ is the Laplace operator acting on bivariate functions as ∆u = ∂2u/∂x2 + ∂2u/∂y2 . As previously, we ﬁx the diﬀusion coeﬃcients to Du = 2 . 10-5 and Dv = 10-5 , but we will vary the feed rate of the ﬁrst substance F > 0, and the kill rate of the second substance k > 0.

1.1 Discretisation in space

We use again the ﬁnite diﬀerence method, extended to two variables. We divide the domain [0, 1]2 into n = m2 cells of size h × h each, where h = 1/m. Then we deﬁne the discrete ﬁelds

ui,j = u(ih, jh, t), vi,j = v(ih, jh, t), i, j = 1, . . . , m. (5)

Since the Laplace operator is a sum of univariate derivatives, we can use the one-dimensional ﬁnite diﬀerence scheme, and approximate

∂u2 ui-1,j _ 2ui,j + ui+1,j

∂x2 h2 ,

∂u2 ui,j -1 _ 2ui,j + ui,j+1

∂y2 h2 ,

and similarly for ∆v (recall the Tutorial on Poisson’s equation). The periodic boundary conditions (3) imply that

um+1,j = u1,j, u0,j = um,j ,

vm+1,j = v1,j, v0,j = vm,j ,

ui,m+1 = ui,1, ui,0 = ui,m , vi,m+1 = vi,1, vi,0 = vi,m .

(6)

(7)

Collecting discrete values ui,j, vi,j into vectors

u(t) = [u1,1 u2,1 . . . um,1 u1,2 . . . um,2 v(t) = [v1,1 v2,1 . . . vm,1 v1,2 . . . vm,2

. . . u1,m . . . um,m]T . . . v1,m . . . vm,m]T

(8)

(9)

we can write the central diﬀerences as the matrix-vector products ∆u, ∆v with the matrix

┐'

∆ e Rm2 ×m2 , (10)

' I . . . 0 I D'

where I e Rm×m is the identity matrix, and

┐'

D = '(') . . . . . . . . . '(') e Rm×m . (11)

' 1 . . . 0 1 _4'

The nonlinear reaction terms in (1),(2) can be written similarly to the ﬁrst assignment

Nu(u, v) = _u o v o v + f _ Fu, Nv(u, v) = u o v o v _ (F + k)v,

where f = [F . . . F]T e Rm2 , and “o” is the elementwise product, but now over the larger vectors (8), (9).

1.2 Time stepping

In the code of this assignment, we will use the Compressed Row Storage (CRS) of matrices, and hence we have to sacriﬁce the possibility of using direct LAPACK solvers in implicit time schemes. However, the acceleration oﬀered by parallel computing allows us to take smaller

1: Input: initial vectors u0 = u(0), v0 = v(0), time step size τ > 0, ﬁnal time T > 0.

2: for e = 0, . . . , [T/τ] _ 1 do

3: Compute

4: Compute B Predictor

5: Compute

6: Compute B Corrector

7: end for

8: Return u(T) ≈ ul , v(T) ≈ vl .

time steps, which render explicit time schemes stable. In particular, we are still employing the Runge-Kutta order-2 Heun method, but now apply it to the entire right hand side of

= ∆u + Nu(u, v),

= ∆v + Nv(u, v).

(12)

(13)

The pseudocode can be summarised as shown in Algorithm 1. The Heun method is stable as long as IλIτ < 2 for all eigenvalues λ of the Jacobian of (12), (13). In our case, the largest eigenvalue is dominated by that of the Laplace matrix (10). A slightly underestimated (ignoring Nu) stability criterion is thus τu < 1/4, where τu = Duτ /h2 , the Courant number.

2 The Assignment

Copy over the ﬁles from the directory $MA40177_DIR/assignment2. They are designed to solve numerically the model problem deﬁned above. However, the subroutines in timestepping .f90 which implements Algorithm 1, and in rhs .f90 which computes the right hand side of (12), (13), are empty. It will be your task to implement them. You will also need to parallelise sequential parts of the code in create_matrix .f90, initial .f90, matmult .f90, and in the main program grayscott .f90.

Part I: Implementation

Q1: Parallelisation of Laplace matrix and initial state

Study the subroutine create_matrix in the ﬁle create_matrix .f90, and the subroutine initial in the ﬁle initial .f90. Identify what lines are speciﬁc for the sequential code, and what parts would remain unchanged in a parallel implementation. Modify both ﬁles such that the matrix and initial state are constructed eﬃciently in parallel. Hint: these tasks are easier than it looks like, i. e. only a few lines actually need changing. [4 points]

Q2: Parallelisation of the matrix-vector product

The subroutine Mat_Mult in the ﬁle matmult .f90 contains the sequential code for multiplica- tion of a CRS matrix by a vector. Modify this code to make it parallel and eﬃcient. Hint: you can build upon the sparsegather routine from Problem Sheet 6, but you need to adapt it to the Gray-Scott model (1)– (4). [4 points]

Q3: Computation of nonlinear terms and Right-Hand Side of (12), (13) Implement a subroutine RHS(u,v,F,k,Du,Dv,Delta,ru,rv) in the ﬁle rhs .f90 which takes

◆ u,v: derived type storage of parallelised real solution vectors u, v,

◆ F: real feed rate F > 0,

◆ k: real kill rate k > 0,

◆ Du,Dv: real diﬀusion coeﬃcients Du > 0 and Dv > 0,

◆ Delta: derived type CRS storage of parallelised real matrix ∆ .

For the given u and v in an appropriately parallelised form, the subroutine should compute and return the right-hand sides

ru = _u o v o v + f _ Fu + Du∆u, rv = u o v o v _ (F + k)v + Dv∆v.

in the same parallelised form in the derived-type variables ru and rv, respectively. Use Mat_Mult subroutine developed in the previous question. [6 points]

Q4: Parallel time stepping algorithm

Implement the subroutine timestepping in the ﬁle timestepping .f90 which implements the Heun method from Algorithm 1. The subroutine should take the following arguments:

◆ u,v: derived type storage of parallelised real solution vectors u, v,

◆ tau: real time step τ > 0,

◆ T: real ﬁnal time T > 0,

◆ F: real feed rate F > 0,

◆ k: real kill rate k > 0,

◆ Du,Dv: real diﬀusion coeﬃcients Du > 0 and Dv > 0,

◆ Delta: derived type CRS storage of parallelised real matrix ∆ .

Hint: use RHS routine developed in the previous question. For simplicity, you can assume that T is a multiple of the time step (T = eτ for some e e &), and that the number of grid cells m in x and y is a multiple of the number of processes nproc. Make sure ﬁrst that your code works on one process (partial credit will be given for a working sequential code.) Make your code eﬃcient and use as few vectors as possible. [6 points]

Q5: Main program and Makeﬁle

Modify the main program in grayscott .f90 such that the code can run in parallel. Make it more user–friendly by allowing the user to specify m, τ, T, F, k, Du, Dv (in this order) in a ﬁle input .dat. Modify the Makeﬁle such that it compiles parallel code correctly. Verify that your program works ﬁrst on one processor and returns positive concentrations. You can use the Fortran subroutine in save_fields .f90 and the Python script visualise .py to plot the solution to a PDF ﬁle. Bear in mind that save_fields .f90 is a sequential code.

In addition, write a 0.5-1 page report describing how you implemented, modiﬁed and tested the codes, what and why you did to make the code eﬃcient and correct. [6 points]

Part II: Theory

Q6: Performance Model

Estimate the number of FLOPs and the number of memory references in one step of Algo- rithm 1 as functions of m. Using the values of tﬂop , tmem , tlat and tword from the Timings handout, estimate the time tseq of one Heun step as a function of m.

Measure the total time and deduce the time of one time step. Compare the theoretically predicted time with the measured sequential time for m = 128 and m = 512.

Now estimate in a similar way the communication time for each process in one Heun step, again as a function of m and the number of processes P . Plot the theoretically predicted computational time tpar for the parallel implementation of Algorithm 1 and the parallel eﬃ- ciency E(n, P) (where n = m2 ) for both m = 128 and m = 512, and the numbers of processes P = 1, 2, 4, 8, 16, 32. (You can use any plotting software.)

How do the results change if the performance improvements from Q10 are implemented? [6 points]

Part III: Consistency and Scaling Tests

In this section, use m = 128, τ = 0.4, T = 3000, F = 0.04, k = 0.059, Du = 2 . 10-5 , Dv = 10-5 unless otherwise speciﬁed.

Q7: Empirical error analysis

Run your code (you can use one processor in this question) for τ = 0.05, 0.1, 0.2, 0.4, printing the solution u ≈ u(1/2, 1, T), corresponding to each value of τ . Compute the error estimates u u and the experimental order of convergence (EOC)

EOCτ = log2 ╱ u,n _ u),n 、

Populate the following table with the values you obtained.

τ	u	u u	EOCτ
0.05
0.1
0.2			—
0.4		—	—

Plot the solutions u, v by invoking python visualise .py in the command line after running your program for m = 128. Copy solution .pdf to your H: drive and attach it below. [3 points]

Q8: Strong scaling

Set τ = 0.03, T = 300, m = 512 and calculate the solution on P = 1, 2, 4, 8, 16, 32 processes. Verify that the result does not change signiﬁcantly (compared to Table in Q7) for diﬀerent P . Measure the time spent in one step of Algorithm 1 for diﬀerent numbers of processes (hint: measure the total time and divide by the number of steps). Plot both the time and the parallel eﬃciency E(n, P), where n = m2 . Repeat the experiment for two diﬀerent values of

m (e.g. m = 128 and m = 512). Compare the measured times to the theoretical estimates in Q6. [4 points]

Q9: Weak scaling

Keep τ = 0.03 and T = 300. Starting from m = 128, successively double the number of points m in each direction and at the same time increase the number of processes by a factor of four (i.e. keep the problem size on a single process ﬁxed). Perform this experiment for P = 1, 4, 16 and for P = 2, 8, 32 processes and measure the time spent in one step of Algorithm 1. Plot this time as a function of the number of processes P and plot the scaled eﬃciency Es(n, P). [4 points]

Q10: Enhancements

Improve the performance of your code. For example you might want to consider the following:

◆ Adjust the parallel vector data structures presented in the lectures such that each process requires memory for only its local part as well as for the two halos (above and below).

2023-02-19

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言