闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MA40177: Scientiﬁc Computing

Assignment 2

1 Two-dimensional Gray-Scott equations

In the second assignment we consider a two-dimensional model of the nonlinear reaction- diﬀusion, focusing on aspects of parallel computations. As previously, we are looking for concentrations u, v of two substances U, V , but now they depend on two variables (x, y) e [0, 1]2 and satisfy the two-dimensional partial diﬀerential equations

= _uv2 + F (1 _ u) + Du∆u on x, y e [0, 1], t e [0, T] (1)

∂v(x, y, t) 2

∂t

u(1, y, t) = u(0, y, t), v(1, y, t) = v(0, y, t), u(x, 1, t) = u(x, 0, t), v(x, 1, t) = v(x, 0, t) u(x, y, 0) = u0 (x, y), v(x, y, 0) = v0 (x, y).

(Boundary conditions) (Initial conditions)

(3)

(4)

Here, ∆ is the Laplace operator acting on bivariate functions as ∆u = ∂2u/∂x2 + ∂2u/∂y2 . As previously, we ﬁx the diﬀusion coeﬃcients to Du = 2 . 10_5 and Dv = 10_5 , but we will vary the feed rate of the ﬁrst substance F > 0, and the kill rate of the second substance k > 0.

1.1 Discretisation in space

We use again the ﬁnite diﬀerence method, extended to two variables. We divide the domain [0, 1]2 into n = m2 cells of size h x h each, where h = 1/m. Then we deﬁne the discrete ﬁelds

ui,j = u(ih, jh, t), vi,j = v(ih, jh, t), i, j = 1, . . . , m. (5)

Since the Laplace operator is a sum of univariate derivatives, we can use the one-dimensional ﬁnite diﬀerence scheme, and approximate

∂u2 ui_1,j _ 2ui,j + ui+1,j

∂x2 h2 ,

∂u2 ui,j _1 _ 2ui,j + ui,j+1

∂y2 h2 ,

and similarly for ∆v (recall the Tutorial on Poisson’s equation). The periodic boundary conditions (3) imply that

um+1,j = u1,j, u0,j = um,j ,

vm+1,j = v1,j, v0,j = vm,j ,

ui,m+1 = ui,1, ui,0 = ui,m , vi,m+1 = vi,1, vi,0 = vi,m .

(6)

(7)

Collecting discrete values ui,j, vi,j into vectors

u(t) = [u1,1 u2,1 . . . um,1 u1,2 . . . um,2 v(t) = [v1,1 v2,1 . . . vm,1 v1,2 . . . vm,2

. . . u1,m . . . um,m]T . . . v1,m . . . vm,m]T

(8)

(9)

we can write the central diﬀerences as the matrix-vector products ∆u, ∆v with the matrix

┐'

∆ e Rm2 xm2 , (10)

' I . . . 0 I D'

where I e Rmxm is the identity matrix, and

┐'

D = '(') . . . . . . . . . '(') e Rmxm . (11)

' 1 . . . 0 1 _4'

The nonlinear reaction terms in (1),(2) can be written similarly to the ﬁrst assignment

Nu(u, v) = _u o v o v + f _ Fu, Nv(u, v) = u o v o v _ (F + k)v,

where f = [F . . . F]T e Rm2 , and “o” is the elementwise product, but now over the larger vectors (8), (9).

1.2 Time stepping

In the code of this assignment, we will use the Compressed Row Storage (CRS) of matrices, and hence we have to sacriﬁce the possibility of using direct LAPACK solvers in implicit time schemes. However, the acceleration oﬀered by parallel computing allows us to take smaller

1: Input: initial vectors u0 = u(0), v0 = v(0), time step size τ > 0, ﬁnal time T > 0.

2: for e = 0, . . . , [T/τ] _ 1 do

3: Compute

4: Compute B Predictor

5: Compute

6: Compute B Corrector

7: end for

8: Return u(T) s ue , v(T) s ve .

time steps, which render explicit time schemes stable. In particular, we are still employing the Runge-Kutta order-2 Heun method, but now apply it to the entire right hand side of

= ∆u + Nu(u, v),

= ∆v + Nv(u, v).

(12)

(13)

The pseudocode can be summarised as shown in Algorithm 1. The Heun method is stable as long as |λ|τ < 2 for all eigenvalues λ of the Jacobian of (12), (13). In our case, the largest eigenvalue is dominated by that of the Laplace matrix (10). A slightly underestimated (ignoring Nu) stability criterion is thus τu < 1/4, where τu = Duτ /h2 , the Courant number.

2 The Assignment

Copy over the ﬁles from the directory $MA40177_DIR/assignment2. They are designed to solve numerically the model problem deﬁned above. However, the subroutines in timestepping .f90 which implements Algorithm 1, and in rhs .f90 which computes the right hand side of (12), (13), are empty. It will be your task to implement them. You will also need to parallelise sequential parts of the code in create_matrix .f90, initial .f90, matmult .f90, and in the main program grayscott .f90.

Part I: Implementation

Q1: Parallelisation of Laplace matrix and initial state

Study the subroutine create_matrix in the ﬁle create_matrix .f90, and the subroutine initial in the ﬁle initial .f90. Identify what lines are speciﬁc for the sequential code, and what parts would remain unchanged in a parallel implementation. Modify both ﬁles such that the matrix and initial state are constructed eﬃciently in parallel. Hint: these tasks are easier than it looks like, i. e. only a few lines actually need changing. [4 points]

Q2: Parallelisation of the matrix-vector product

The subroutine Mat_Mult in the ﬁle matmult .f90 contains the sequential code for multiplica- tion of a CRS matrix by a vector. Modify this code to make it parallel and eﬃcient. Hint: you can build upon the sparsegather routine from Problem Sheet 6, but you need to adapt it to the Gray-Scott model (1)– (4). [4 points]

Q3: Computation of nonlinear terms and Right-Hand Side of (12), (13) Implement a subroutine RHS(u,v,F,k,Du,Dv,Delta,ru,rv) in the ﬁle rhs .f90 which takes

◆ u,v: derived type storage of parallelised real solution vectors u, v,

◆ F: real feed rate F > 0,

◆ k: real kill rate k > 0,

◆ Du,Dv: real diﬀusion coeﬃcients Du > 0 and Dv > 0,

◆ Delta: derived type CRS storage of parallelised real matrix ∆ .

For the given u and v in an appropriately parallelised form, the subroutine should compute and return the right-hand sides

ru = _u o v o v + f _ Fu + Du∆u, rv = u o v o v _ (F + k)v + Dv∆v.

in the same parallelised form in the derived-type variables ru and rv, respectively. Use Mat_Mult subroutine developed in the previous question. [6 points]

Q4: Parallel time stepping algorithm

Implement the subroutine timestepping in the ﬁle timestepping .f90 which implements the Heun method from Algorithm 1. The subroutine should take the following arguments:

◆ u,v: derived type storage of parallelised real solution vectors u, v,

◆ tau: real time step τ > 0,

◆ T: real ﬁnal time T > 0,

◆ F: real feed rate F > 0,

◆ k: real kill rate k > 0,

◆ Du,Dv: real diﬀusion coeﬃcients Du > 0 and Dv > 0,

◆ Delta: derived type CRS storage of parallelised real matrix ∆ .

Hint: use RHS routine developed in the previous question. For simplicity, you can assume that T is a multiple of the time step (T = eτ for some e e &), and that the number of grid cells m in x and y is a multiple of the number of processes nproc. Make sure ﬁrst that your code works on one process (partial credit will be given for a working sequential code.) Make your code eﬃcient and use as few vectors as possible. [6 points]

Q5: Main program and Makeﬁle

Modify the main program in grayscott .f90 such that the code can run in parallel. Make it more user–friendly by allowing the user to specify m, τ, T, F, k, Du, Dv (in this order) in a ﬁle input .dat. Modify the Makeﬁle such that it compiles parallel code correctly. Verify that your program works ﬁrst on one processor and returns positive concentrations. You can use the Fortran subroutine in save_fields .f90 and the Python script visualise .py to plot the solution to a PDF ﬁle. Bear in mind that save_fields .f90 is a sequential code.

In addition, write a 0.5-1 page report describing how you implemented, modiﬁed and tested the codes, what and why you did to make the code eﬃcient and correct. [6 points]

Insert your report on implementation here

Part II: Theory

Q6: Performance Model

Estimate the number of FLOPs and the number of memory references in one step of Algo- rithm 1 as functions of m. Using the values of tﬂop , tmem , tlat and tword from the Timings handout, estimate the time tseq of one Heun step as a function of m.

Measure the total time and deduce the time of one time step. Compare the theoretically predicted time with the measured sequential time for m = 128 and m = 512.

Now estimate in a similar way the communication time for each process in one Heun step, again as a function of m and the number of processes P . Plot the theoretically predicted computational time tpar for the parallel implementation of Algorithm 1 and the parallel eﬃ- ciency E(n, P) (where n = m2 ) for both m = 128 and m = 512, and the numbers of processes P = 1, 2, 4, 8, 16, 32. (You can use any plotting software.)

How do the results change if the performance improvements from Q10 are implemented? [6 points]

Insert your answer here and attach plots on separate sheets

Part III: Consistency and Scaling Tests

In this section, use m = 128, τ = 0.4, T = 3000, F = 0.04, k = 0.059, Du = 2 . 10_5 , Dv = 10_5 unless otherwise speciﬁed.

Q7: Empirical error analysis

Run your code (you can use one processor in this question) for τ = 0.05, 0.1, 0.2, 0.4, printing the solution u s u(1/2, 1, T), corresponding to each value of τ . Compute the error estimates u u and the experimental order of convergence (EOC)

EOCτ = log2 ╱ u,n _ u),n 、

Populate the following table with the values you obtained.

τ	u	u u	EOCτ
0.05
0.1
0.2			—
0.4		—	—

Plot the solutions u, v by invoking python visualise .py in the command line after running your program for m = 128. Copy solution .pdf to your H: drive and attach it below.

Comment on the results here and attach the plot on a separate sheet

[3 points]

Q8: Strong scaling

Set τ = 0.03, T = 300, m = 512 and calculate the solution on P = 1, 2, 4, 8, 16, 32 processes. Verify that the result does not change signiﬁcantly (compared to Table in Q7) for diﬀerent P . Measure the time spent in one step of Algorithm 1 for diﬀerent numbers of processes (hint: measure the total time and divide by the number of steps). Plot both the time and the parallel eﬃciency E(n, P), where n = m2 . Repeat the experiment for two diﬀerent values of

m (e.g. m = 128 and m = 512). Compare the measured times to the theoretical estimates in Q6.

Attach scaling plots on a separate sheet [4 points]

Q9: Weak scaling

Keep τ = 0.03 and T = 300. Starting from m = 128, successively double the number of points m in each direction and at the same time increase the number of processes by a factor of four (i.e. keep the problem size on a single process ﬁxed). Perform this experiment for P = 1, 4, 16 and for P = 2, 8, 32 processes and measure the time spent in one step of Algorithm 1. Plot this time as a function of the number of processes P and plot the scaled eﬃciency Es(n, P).

Attach scaling plots on a separate sheet [4 points]

Write a short report, summarising your results in Q8 and Q9

Q10: Enhancements

Improve the performance of your code. For example you might want to consider the following:

◆ Adjust the parallel vector data structures presented in the lectures such that each process requires memory for only its local part as well as for the two halos (above and below).

◆ Overlap calculations and communications using non-blocking MPI sends and receives. You can also try to make communication more eﬃcient by sending several vectors in one message.

◆ Memory references and divisions are around ten times more expensive than multiplica- tions, additions or subtractions. Exploit this fact by writing a matrix-free version of the algorithm. To do this write a new subroutine for applying the matrix ∆ to a vector that avoids explicit storage of the matrix and too many divisions.

For any performance enhancement you make, write a short report on how you implemented it, and provide plots or tables that conﬁrm the improvement. [7 points]

Insert discussion of further enhancements and attach plots here.

General Remarks

◆ Write routines and programs to test parts of your code.

◆ Support your observations with numerical results. You can also use graphs.

◆ Typical trends for convergence and cost with respect to some variable n are αnβ and αβn for some constants α, β e R.

◆ Use as many subroutines and functions as you deem necessary.

◆ Choose helpful names for variables, subroutines, functions, etc.

◆ Provide comments in your code where appropriate.

◆ Make sure your code is properly indented. Your text editor may be able to help you with this.

◆ “Programs must be written for people to read, and only incidentally for machines to execute.”, Abelson & Sussman, Structure and Interpretation of Computer Programs

◆ Together with your Fortran code and Makeﬁle, provide a README ﬁle explaining how to compile and run your programs.

◆ Try to make sure that it is easy to repeat the experiments you used to verify your code and to generate the results you use in your report.

2023-02-11

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths