Homework 1 - ELEC 475-575
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
SoLuTIoN
Homework 1 - ELEC 475-575
1 Exercise
Show that:
I(X, Y) = (Y(X)) ┌log( )┐ 达 0 (1.1)
Solution 1:
Let’s first have an inequality for log(x)Ax 达 0 since by definition any distribution f , we have f (x) 达 0Ax.
We know that log is differentiable and concave, then then it is bounded above by its first-order Taylor approximation
log(y) k log(x) + (y 尸 x)Ax, y ← R+ . (1.2)
Therefore, for x = 1 and Ay ← R+ we have,
log(y) k y 尸 1
Then, we have
I(X, Y) = (Y(X)) ┌log( )┐ (1.3)
= (Y(X)) ┌ 尸 log( )┐ . (1.4)
Let’s prove that for any real valued functions f and g µ-integrable, if we have g k f then gdµ k fdµ for any positive measure µ .
Proof. We have,
f = f+ 尸 f − , (1.5)
where f+ (x) = max(f (x), 0) and f − (x) = max(尸f (x), 0), and we know that for any f µ-integrable we have,
fdµ = f+ dµ 尸 f − dµ .
Since f+ 达 g+ and both are non-negative functions we have the following result,
f+ dµ 达 g+ dµ .
then, we have that 尸f (x) k 尸g(x), thus
max(尸f (x), 0) k max(尸g(x), 0),
and since max(尸f (x), 0) is positive function,
max(尸f (x), 0)dµ k max(尸g(x), 0)dµ .
Therefore,
尸 f − (x)dµ 达 尸 g − dµ .
As a result,
f+ dµ 尸 f − dµ 达 g+ dµ 尸 g − dµ .
(1.6)
(1.7)
(1.8)
(1.9)
(1.10)
(1.11)
From that proof, and denoting by fX,Y = the Radon-Nikodym derivative of the joint distri- bution (X, Y) with respect to the Lebesgue measure dλ, we have
I(X, Y) 达 (Y(X)) ┌ 1 尸 ┐
达 fX,Y (x, y)(1 尸 )dλ
达 fX,Y (x, y)dλ 尸 fX (x)fY (y)dλ
(1.12) (1.13)
(1.14)
Since by definition of density functions, both fX and fY are λ-integrable, Fubini theorem applies. In addition, density function integrates to 1, thus
I(X, Y) 达 fX,Y (x, y)dxdy 尸 / fX (x)fY (y)dx、dy (1.15)
达 0. (1.16)
Solution 2:
I(X, Y) = (Y(X)) ┌ 尸 log( )┐ . (1.17)
and since 尸log is a convex function, we have by Jensen inequality
I(X, Y) = (Y(X)) ┌ 尸 log()┐ (1.18)
达 尸 log ┌ f(X,Y )(x, y)( )dxdy┐ (1.19)
达 尸 log fX (x)fY (y)dxdy . (1.20)
By Fubuni theorem and the fact that density function integrates to 1, we find that
I(X, Y) 达 0. (1.21)
Notice that the equality is respected if the two random variable X and Y are independent.
2 Exercise
The differential entropy of f is defined as:
尸 log(f (x))dPX (2.1)
2.1 First Question
一 Real Case:
Given, X | Ⅳ (µ, σ2 )
Proof.
h(X) = 尸 log ┌ exp(尸 (1(x) σ(尸)2µ)2 ┐ dPX
= 尸 log( ^(2πσ2 ) ) dPX + 2σ2 (x 尸 µ) dP2X = 尸 log() + E((X 尸 E(X))2 )
= 尸 log() +
= log(2πσ2 exp(1))
(2.2) (2.3) (2.4) (2.5)
(2.6)
一 Complex Case:
Proof. We consider a circular symmetric complex Gaussian random variable Z . Then we know that Z = X + iY such that X and Y are both iid centered Gaussian Variable. Then the differential entropy of Z is given by the joint differential entropy h(X, Y). Since X and Y are independent random variable we have that:
h(Z) = h(X) + h(Y) (2.7)
Since X and Y are Gaussian Random variable we have that:
h(Z) = 2 ′ log(2πσ2 exp(1)) (2.8)
= log(2πσ2 exp(1)) (2.9)
2.2 Second Question
Let’s find the real distribution f that verify the following convex optimisation problem:
Minf log(f (x))dPX
s.t: dPX = 1& xdPX = 0& x dP2X = σ 2
Proof. The Lagrangian is given by:
夕 = log(f (x))dPX + λ0 ( dPX 尸 1) + λ1 ( xdPX ) + λ2 ( x dP2X 尸 σ )2 .
Then we derive according to f (y) Ay ,
(2.10) (2.11)
(2.12)
= 1 + log(f (y)) + λ0 + λ1 y + λ2 y2
δf (y)
Thus Ay ,
f (y) = exp(尸1 尸 λ0 尸 λ1 y 尸 λ2 y )2
Therefore the distribution that maximize the differential entropy has the following form
f (x) = exp(尸1 尸 λ0 ) exp(尸(λ1 y + λ2 y ))2
(2.13)
(2.14)
(2.15)
Then solving the system of linear equation that respect the constraints we find a special case of the exponential distribution: The Gaussian distribution with mean 0 and variance σ 2
3 Exercise
3.1 First Question
Show that
I(X, Y) = I(Y, X). (3.1)
Solution 1:
Since fX,Y = fY,X ,
I(X, Y) = I(Y, X). (3.2)
Solution 2:
I(X, Y) = H(X) 尸 H(XlY) = H(Y) 尸 H(Y lX) = I(Y, X) (3.3)
3.2 Second Question
Show that
I(X, Y lZ) = I(X; Y, Z) 尸 I(X; Z). (3.4)
We know that,
I(X, Y lZ) = H(X[Z) 尸 H(XlY, Z) (3.5)
Then we know that,
I(X; Y, Z) = H(X) 尸 H(XlY, Z), (3.6)
and
I(X, Z) = H(X) 尸 H(XlZ). (3.7)
Thus,
I(X; Y, Z) 尸 I(X, Z) = H(X) 尸 H(XlY, Z) 尸 (H(X) 尸 H(XlZ)) = H(XlZ) 尸 H(XlY, Z). (3.8)
Therefore,
I(X, Y lZ) = I(X; Y, Z) 尸 I(X, Z). (3.9)
4 Exercise
Show that the following application is a proper metric (i.e: respects the non-negativity, symmetry and triangular inequality and the identity of indiscernible)
D(X, Y) = H(X, Y) 尸 I(X; Y) (4.1)
Solution:
Non-negativity: The aim here is to prove:
D(X, Y) 达 0 (4.2)
Proof.
D(X, Y) = H(X, Y) 尸 I(X, Y) = H(X, Y) 尸 (H(X, Y) 尸 H(XlY) 尸 H(Y lX)) (4.3)
= H(XlY) + H(Y lX) (4.4)
达 0 (4.5)
Equation 4.4 is because the conditional entropy is non-negative.
Symmetry: We want to prove that:
D(X, Y) = D(Y, X). (4.6)
Proof.
D(X, Y) = H(XlY) + H(Y lX). (4.7)
Triangular inequality: We want to prove that:
D(X, Z) k D(X, Y) + D(Y, Z). (4.8)
which is equivalent to:
H(XlZ) + H(ZlX) k H(XlY) + H(Y lX) + H(Y lZ) + H(ZlY). (4.9)
Thus it is necessary and sufficient to prove that the triangular inequality if respected by the condi-
tional entropy:
H(XlZ) k H(XlY) + H(Y lZ).
Proof. We know that,
I(X, ZlY) = H(XlY) 尸 H(XlZ, Y) 达 0
which implies
H(XlY) 达 H(XlZ, Y).
Thus,
H(XlY, Z) + H(Y lZ) k H(XlY) + H(Y lZ)
Then, since
H(XlY, Z) + H(Y lZ) = H(X, Y lZ)
we have,
H(X, Y lZ) k H(XlY) + H(Y lZ)
(4.10)
(4.11) (4.12) (4.13) (4.14)
(4.15)
and,
H(X, Y lZ) = H(XlZ) + H(Y lX, Z) 达 H(XlZ),
we find that,
H(XlZ) < H(XlY) + H(Y lZ).
(4.16)
(4.17)
identity of indiscernible: We have to show that:
D(X, Y) = 0 体 X = Y, (4.18)
which is equivalent to prove that:
H(XlY) + H(Y lX) = 0 体 X = Y. (4.19)
Proof. We know that H(XlY) = 0 if and only if X is a function of Y . Then, H(Y lX) = 0 if and only if Y is a function of X . As a result,
H(XlY) = H(Y lX) = 0 If and only if X = Y
(4.20)
5 Exercice
We have to show that: Given, xi and xj two eigenvectors of a symmetric matrix A, < xi , xj >= 0, Ai, j
Proof. Notice that by definition, the vector space spanned by the eigenvector of a matrix A is
defined by:
Eλ = ìxlAx = λx; .
Ai, j we have,
Axi = λi xi
Axj = λj xj .
Thus,
< x2 , Ax1 >= λi < xj , xi >
< xi , Axj >= λj < xi , xj > .
Since A is symmetric (i.e: AT = A),
< AT xj , xi >=< Axj , xi >= λi < xj , xi > .
Therefore,
(λi 尸 λj ) < xi , xj >= 0.
Ai, j < xi , xj >= 0
(5.1)
(5.2) (5.3)
(5.4) (5.5)
(5.6)
(5.7)
(5.8)
By drawing the plot for each channel, we conclude the dataset is ergodic since the averages converge to a constant for large values of n.
2023-03-01