MATH0033: Numerical Methods
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MATH0033: Numerical Methods
1. Consider the numerical solution of f (x) = 0 where f (x) = ex/2 - x - 1, which has a unique non-zero root α satisfying α e [2, 3].
(a) Show that it is possible to approximate the root α using the bisection method
with initial interval [2, 3], and calculate the number of iterations required for the absolute error to be smaller than 10-5 .
You may assume without proof that 24/3 < e < 3 .
(b) Let φ(x) = 2 log (x + 1) for x > -1.
Show that x = φ(x) is consistent with f (x) = 0, and, using an appropriate the- orem from lectures, prove that the fixed point iteration xk+1 = φ(xk ) converges linearly to α for all initial guesses x0 e [2, 3], and satisfies an error bound
Ixk - αI ≤ Ck Ix0 - αI, k = 0, 1, 2, . . . ,
where 0 < Ck < 1 is a k-dependent constant you should specify.
(c) Let ϕ : R → R be continuously differentiable and let β e R be a fixed point of
ϕ s.t. Iϕ1 (β)I > 1. Prove that the iteration xk+1 = ϕ(xk ), k = 0, 1, 2, . . ., does not converge to β for any initial guess x0 e R, unless xk = β for some k . Hint: argue by contradiction, using the mean value theorem.
SOLUTION:
(a) The function f is continuous with f (2) = e - 3 < 0 and f (3) = e3/2 - 4 > 0 (since 24/3 < e < 3), so that f (2)f (3) < 0. Hence the bisection method applies, and the absolute error Ixk - αI after the kth iteration (with xk defined to be the midpoint of the interval under consideration at the kth iteration) is bounded by = . Hence to ensure the error is smaller than 10-5 it suffices to take k to be the smallest integer such that 2k+1 > 105 , i.e. k > 5 log 10/ log 2 - 1.
(b) To check consistency we first note that all roots of f (x) = 0 satisfy x > -1 (since ex/2 > 0 for all x) and all fixed points of φ also satisfy x > -1 (since log (x + 1) is real if and only if x > -1). Hence, to show consistency it is enough to note that, for x > -1,
f (x) = 0 兮 ex/2 = x + 1 兮 x/2 = log (x + 1) 兮 x = φ(x).
That the fixed point iteration xk+1 = φ(xk ) converges to α for every initial guess x0 e [2, 3] follows from the contraction mapping theorem. We first have to show that φ maps [2, 3] to [2, 3]. For this it is enough to check that φ(2) = 2 log 3 > 2 (since e < 3 implies log 3 > 1), φ(3) = 2 log 4 = log 8 < 3 (since e > 2), and φ is monotonic on [2, 3] since φ1 (x) = 2/(x + 1) > 0 for all x e [2, 3]. We then have to show that φ is a contraction on [2, 3]. For this it is enough to note that φ is differentiable with Iφ1 (x)I = I2/(x + 1)I ≤ I2/(2 + 1)I = 2/3 for x e [2, 3]. The contraction mapping theorem also provides the linear convergence estimate Ixk - αI ≤ (2/3)Ixk -1 - αI, from which it follows by induction that Ixk - αI ≤ Ck Ix0 - αI, with Ck = (2/3)k .
(c) If xk = β for some k then xk+j = β for all j > 0 and we have convergence. So now assume xk β for any k , and suppose, aiming for a contradiction, that xk → β as k → o. Then for every ∈ > 0 there exists k6 e N such that Ixk - βI < ∈ for all k > k6 . Also, since Iϕ1 (β)I > 1 and ϕ1 is continuous, there exists ∈ * > 0 such that Iϕ1 (x)I > 1 for all x e (β - ∈ * , β + ∈* ). By the mean value theorem we can write
Ixk+1 - βI = Iϕ(xk ) - ϕ(β)I = Iϕ1 (ξk )IIxk - βI
for some ξk between xk and β. Hence Ixk+1 - βI > Ixk - βI for all k > k6* (since k > k6* implies Iϕ1 (ξk )I > 1, and both Ixk+1 - βI and Ixk - βI are non-zero). But this contradicts the assumption that xk → β as k → o, and therefore this assumption must have been false, i.e. the sequence xk cannot converge to β .
2. Consider the stationary iterative method
xk+1 = Bxk + c, k = 0, 1, 2, . . . ,
for the numerical solution of the linear system
Ax = b.
Here A, B e Rnxn and x, b, c e Rn for some n e N.
(a) Given a vector norm | . | : Rn → R, write down the definition of the induced matrix norm | . | : Rnxn → R.
(b) Assuming the method is consistent, prove that if |B| < 1 then |x - xk | → 0
as k → o.
(c) What does it mean for the matrix A to be strictly diagonally dominant (SDD) by rows?
(d) Write down the definition of the iteration matrix B for the Jacobi method. Prove that if A is SDD then the Jacobi method converges.
You may use without proof the fact that the matrix norm induced by the vector norm |v|o := maxi=1,...,n Ivi I satisfies
n
|M|o = ia,.n() ↓ Imij I, M = (mij ) e Rnxn .
j=1
(e) Now consider the system of nonlinear equations
f1 (x1 , x2 ) = x1(2) - x2 = 0,
f2 (x1 , x2 ) = x2(2) + x1 - 1 = 0,
which can be written as f(x) = 0 with f = (f1 , f2 )T and x = (x1 , x2 )T . Write down the Newton iteration for the solution of the system f(x) = 0. Give a sufficient condition on the initial guess x(0) guaranteeing that the first Newton step can be computed using a Jacobi iteration.
One root of the system lies in the quadrant x1 > 0, x2 > 0. Does the Newton iteration converge linearly or quadratically to this root (for a sufficiently good initial guess)?
SOLUTION:
(a) The induced matrix norm is defined by
|Ax|
0↓oRn |x| ↓oRn , l↓l=1
(Either formula is acceptable. Note also that “sup” can be replaced by “max” since we are in finite dimensions.)
(b) Consistency means that x = Bx + c. Hence x - xk+1 = Bx + c - (Bxk + c) = B(x - xk ), so by induction
x - xk = Bk (x - x0 ), k = 0, 1, 2, . . . .
Then
|x - xk | ≤ |Bk ||x - x0 | ≤ |B|k |x - x0 |, so if |B| < 1 it follows that |x - xk | → 0 as k → o.
(c) The matrix A is SDD by rows if, for each i = 1, . . . , n,
n
Iaii I >↓ Iaij I.
j=1
j i
(d) The Jacobi method has iteration matrix
B = I - D-1 A,
where D is the diagonal of A, so that the entries of B satisfy bii = 0 and bij = -aij /aii for i j . Using the formula in the hint,
|B|o = io/1(m)a,.n(){ Ibij I = io/1(m)a,.n(){ = io/1(m)a,.n(){ Iaij I < 1,
j i j i
where the final inequality holds because A is SDD. Applying the result in part (b) (and noting that the Jacobi method is consistent), we conclude that the Jacobi iteration converges.
(e) The Newton iteration for nonlinear systems takes the general form
x(n+1) = x(n) - ╱ D礻 (x(n))、-1 f(x(n)),
n = 0, 1, . . . ,
where
|
! D礻 (x) = ( |
∂f1 ∂x1 ∂f2 ∂x1 |
(x)
(x) |
∂f1 ∂x2 ∂f2 ∂x2 |
(x) \ ì (x) )ì . |
|
For this particular example,
D礻 (x) = 、,
so the explicit form of the iteration is
╱ x(x) \ = ╱ x(x)n(n) \ - ) \-1 ╱ (x((2n(x)(n)1)2 x(-)n(x)n)- 1 \ .
From (d), a sufficient condition guaranteeing that the first Newton step can be
found using a Jacobi iteration is that the matrix D礻 (x(0) ) is strictly diagonally dominant by rows. This holds if and only if IxI > 1/2 and IxI > 1/2. (An alternative approach is to determine conditions under which the spectral radius of the matrix D礻 (x(0) ) is less than one. Students should be given full credit if they manage to follow this approach successfully and obtain a valid sufficient condition. But it is much more laborious than using (d).)
Since the determinant det(D礻 (x)) = 4x1 x2 + 1 is positive (in particular, non- zero) for x1 > 0 and x2 > 0, the Jacobian matrix is nonsingular at the root in question. Hence the general theory from lectures tells us that the method converges quadratically to the root, for a sufficiently good initial guess.
3. (a) Consider the linear system Mx = b where M = ╱ 1(3) µ(2) 、,
µ 0 is a real parameter, and b e R2 .
Write down the Jacobi and the Gauss-Seidel methods for the iterative solution of the linear system in the form Pxk+1 = Nxk + b where P and N are matrices you should specify. Without computing the iteration matrices, give a sufficient condition on the parameter µ so that the Jacobi and the Gauss-Seidel methods are both convergent.
(b) Compute the iteration matrices BJ and BGS for the Jacobi and the Gauss-
Seidel methods respectively. Using information from the iteration matrices, determine a necessary and sufficient condition on µ for the Jacobi and Gauss- Seidel methods to be convergent. Which method is expected to converge faster?
(c) Let A be a symmetric positive definite matrix. Consider the gradient method for the solution of Ax = b, defined by xk+1 = xk + αk rk , where rk = b - Axk and αk is the step length. Prove that the optimal choice of αk , minimising |ek+1|A , where ek = x - xk and |v|A(2) := vT Av = (Av, v), is
|rk |2(2)
αk =
Hint: First show that Aek = rk and rk+1 = rk - αk Ark .
SOLUTION:
(a) Given an initial guess x0 e R2 , the Jacobi iteration is
PJ xk+1 = NJ xk + b,
where
PJ = ╱ 0(3) µ(0) 、, NJ = 、,
and the Gauss-Seidel iteration is
PGS xk+1 = NGS xk + b,
where
PGS = ╱ 1(3) µ(0) 、, NGS = 、.
Both the Jacobi and the Gauss-Seidel methods converge if M is strictly diag- onally dominant by rows, i.e. if IµI > 1.
(b) The iteration matrices are
BJ = P-J1 NJ = 、 ╱ 、= 、,
and
BGS = P-GS(1)NGS = ╱ 1(3) µ(0) 、-1 、= 、 ╱ 、 = 、 = 、.
The eigenvalues of BJ satisfy
│ = 0, i.e. λ2 - = 0,
so that the spectral radius ρ(BJ ) = / .
The eigenvalues of BGS satisfy
│ = 0, i.e. λ ╱λ - 、 = 0, i.e. λ 1 = 0, λ2 = 2/(3µ),
so that ρ(BGS ) = .
To obtain necessary and sufficient conditions for convergence we determine when the spectral radii of the iteration matrices are smaller than one. In both cases the condition for convergence is IµI > 2/3.
When the methods converge, on the basis of eigenvalues Gauss-Seidel is ex- pected to converge faster than Jacobi because it gives a smaller spectral radius. (x < ′x for 0 < x < 1.)
(But note that the iteration matrices are not symmetric so we do not have a rigorous linear convergence result in the 2-norm with the spectral radius as a convergence constant.)
(c) Following the hint we first note that by the definitions of rk and ek we have Aek = A(x - xk ) = b - Axk = rk
(which implies also that ek = A-1 rk ) and
rk+1 = b - Axk+1 = b - A(xk + αk rk ) = rk - αk Ark .
Then applying these identities we see that
|ek+1|A(2) = (Aek+1 , ek+1)
= (rk+1, A-1 rk+1)
= (rk - αk Ark , A-1 rk - αk rk )
= (rk , A-1 rk ) - αk (rk , rk ) - αk (Ark , A-1 rk ) + αk(2)(Ark , rk ) = (Aek , ek ) - αk (rk , rk ) - αk (rk , rk ) + αk(2)(Ark , rk ) = |ek |A(2) - 2αk |rk |2(2) + αk(2) |rk |A(2) .
This is a quadratic function of αk , which is minimised at the critical point where d(|ek+1|A(2))/dαk = 0, i.e. where -2|rk |2(2) + 2αk |rk |A(2) = 0, which gives
|rk |2(2)
αk =
4. For the solution of the Cauchy problem
y(0) = y0 ,
consider the following family of numerical methods, parametrized by 0 ≤ θ ≤ 1:
(*)
(a) For which θ is the method explicit?
(b) State the values of θ for which (*) coincides with the forward Euler, backward
Euler and Crank-Nicolson methods respectively.
(c) State the definition of truncation error for a general one-step method. Show that the method (*) is of second order if θ = 1/2, and of first order otherwise. State clearly the smoothness assumptions under which your analysis is valid.
(d) State the definition of absolute stability for a general one-step method. Prove that the method (*) is unconditionally absolutely stable if θ > 1/2 and condi- tionally absolutely stable if θ < 1/2. In the latter case, determine a necessary and sufficient condition on h for the method to be absolutely stable.
SOLUTION:
(a) The method is explicit if and only if θ = 0.
(b) With θ = 0 this is forward Euler, with θ = 1 this is backward Euler, and with
θ = 1/2 this is Crank-Nicolson.
(c) The truncation error of a one-step method un+1 = un + hΨ(tn , un , un+1; h) is
yn+1 - yn
h
where yn = y(tn ) denotes the exact solution of the Cauchy problem at time tn . For the θ-method,
Ψ(tn , yn , yn+1; h) = g(h) := (1 - θ)f (tn , yn ) + θf (tn + h, y(tn + h)).
Taylor expansion gives (with yn(1) denoting y1 (tn ) etc.)
h2 h3
and
so that
Tn = (yn(1) - g(0)) + h ╱ yn(11) - g1 (0)、+ h2 ╱ y111 (tn + ξ) - g11 (η)、.
The O(1) term vanishes for all θ because g(0) = f (tn , yn ) = yn(1) (since y1 = f). Looking at the O(h) term we note that, by the chain rule,
g1 (h) = θ ╱ + f、│t=tn+h,y=y(tn+h) ,
(**)
so that
g1 (0) = θ ╱ (tn , yn ) + f (tn , yn ) (tn , yn )、= θyn(11) ,
where the final equality comes from applying the chain rule to y1 = f .
Thus the O(h) term is
h(1/2 - θ)yn(11) ,
which vanishes if θ = 1/2, but not if θ 1/2 (unless yn(11) = 0, which doesn’t hold in general). That is, for sufficiently smooth y and f , Tn = O(h2 ) for θ = 1/2 and Tn = O(h) for θ 1/2.
Regarding smoothness assumptions, the analysis for θ 1/2 is valid provided that y11 and g1 exist and are bounded. Recalling (**), the boundedness assump-
tion on g1 is satisfied if f and its first order partial derivatives are bounded.
The analysis for θ = 1/2 is valid provided that y111 and g11 exist and are bounded. By the chain rule,
g11 (h) = θ ╱ + f ╱ + 、 + ╱ + f、 + f2 、│t=tn+h,y=y(tn+h) ,
so the boundedness assumption on g11 is satisfied if f and its partial derivatives up to second order are bounded.
(d) A method is absolutely stable if, when applied to the model problem
where λ < 0, it generates a sequence of approximations un which satisfy un → 0 as n → o.
Applying the θ-method to the model problem (note f (t, y(t)) = λy(t)) gives un+1 = un + hλ ╱(1 - θ)un + θun+1、,
i.e.
un+1 = ╱ 、un .
Hence
un = ╱ 、n u0 , n = 1, 2, . . . .
Thus the method is absolutely stable if and only if
1 + (1 - θ)hλ
1 - θhλ
The right-hand inequality is equivalent to hλ < 0, which is always satisfied since λ < 0 and h > 0. The left-hand inequality is equivalent to
(2θ - 1)hλ < 2.
If θ > 1/2 then this inequality holds for all h > 0 (since the LHS is ≤ 0 and the RHS is > 0). But if θ < 1/2 then it holds if and only if
h < 1
5. (a) Define the 2-norm condition number K2 (A) of an invertible matrix A e Rnxn . Show that if Ax = b with b 0 then for any e Rn ,
| - x|2 |A - b|2
|x|2 |b|2 .
(b) Write down a definition involving eigenvalues of what it means for a matrix A e Rnxn to be symmetric positive definite (SPD).
Prove that if A is SPD then A is invertible, A-1 is SPD, and
λmax
K2 (A) =
where λmax and λmin are the maximum and minimum eigenvalues of A.
(c) Consider the second order boundary value problem
where µ > 0 and f is continuous on [0, 1], and the finite difference method
! + µun = fn , n = 1, . . . , N - 1,
where N e N, h = 1/N , xn = nh, fn = f (xn ) and un is the approximation of y(xn ), for each n = 0, . . . , N .
Write the method as an (N - 1)-by-(N - 1) linear system of the form Au = f , specifying the system matrix A and the vectors u and f .
(d) In the special case µ = 0 the eigenvalues of A are given by the formula λn = ╱ 1 + cos ╱ 、、, n = 1, . . . , N - 1.
Use this fact to prove that
K2 (A) = + O(1), N → o.
In light of (a), what does this tell you about the usefulness of a residual-based stopping criterion when solving the system Au = f using an iterative method, when N is large?
SOLUTION:
(a) The condition number is K2 (A) := |A|2 |A-1 |2 , where | . |2 : Rnxn → R is the matrix norm induced by the vector 2-norm (Euclidean norm) on Rn . To prove the required bound we note that
| - x|2 = |A-1 (A - b)|2 ≤ |A-1 |2 |A - b|2
and
|b|2 = |Ax|2 ≤ |A|2 |x|2 .
Combining these gives
| - x|2 ≤ |A|2 |A-1 |2 |A - b|2 = K2 (A) |A - b|2
2022-05-07