Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SoLuTIoN

Homework 1 - ELEC 475-575

1    Exercise

Show that:

I(X, Y) = (Y(X))  log( ) 0                                         (1.1)

Solution 1:

Let’s rst have an inequality for log(x)Ax   0 since by definition any distribution  f , we have f (x) 0Ax.

We know that log is differentiable and concave, then then it is bounded above by its rst-order Taylor approximation

log(y) k log(x) +  (y  x)Ax, y ← R+ .                                         (1.2)

Therefore, for x = 1 and Ay ← R+  we have,

log(y) k y  1

Then, we have

I(X, Y) = (Y(X))  log(  )┐                                            (1.3)

= (Y(X))   log( ).                                         (1.4)

Let’s prove that for any real valued functions f and g µ-integrable, if we have g k f then    gdµ k fdµ for any positive measure µ .

Proof. We have,

f = f+ ,                                                              (1.5)

where f+ (x) = max(f (x), 0) and f (x) = max(尸f (x), 0), and we know that for any f µ-integrable we have,

fdµ =     f+ dµ     f dµ .

Since f+   g+  and both are non-negative functions we have the following result,

f+ dµ     g+ dµ .

then, we have that 尸f (x) k g(x), thus

max(f (x), 0) k max(g(x), 0),

and since max(尸f (x), 0) is positive function,

max(尸f (x), 0)dµ k     max(尸g(x), 0)dµ .

Therefore,

    f (x)dµ    g dµ .

As a result,

f+ dµ     f      g+ dµ    g dµ .

(1.6)

(1.7)

(1.8)

(1.9)

(1.10)

(1.11)

From that proof, and denoting by fX,Y  =  the Radon-Nikodym derivative of the joint distri- bution (X, Y) with respect to the Lebesgue measure dλ, we have

I(X, Y) (Y(X))  1

    fX,Y (x, y)(1                   )dλ

    fX,Y (x, y)dλ     fX (x)fY (y)dλ

(1.12) (1.13)

(1.14)

Since by definition of density functions, both fX  and fY  are λ-integrable, Fubini theorem applies. In addition, density function integrates to 1, thus

I(X, Y)     fX,Y (x, y)dxdy      /    fX (x)fY (y)dxdy                         (1.15)

0.                                                                                                  (1.16)

Solution 2:

I(X, Y) = (Y(X))   log( ).                                       (1.17)

and since 尸log is a convex function, we have by Jensen inequality

I(X, Y) = (Y(X))   log()┐                                                 (1.18)

  log ┌    f(X,Y )(x, y)( )dxdy┐                           (1.19)

log     fX (x)fY (y)dxdy .                                                      (1.20)

By Fubuni theorem and the fact that density function integrates to 1, we nd that

I(X, Y) 0.                                                             (1.21)

Notice that the equality is respected if the two random variable X and Y are independent.

2    Exercise

The dierential entropy of f is dened as:

    log(f (x))dPX                                                                                     (2.1)

2.1    First Question

 Real Case:

Given, X | Ⅳ (µ, σ2 )

Proof.

h(X) =    log   exp(  (1(x) σ(尸)2µ)2 dPX

log( ^(2πσ2 ) )     dPX  + 2σ2       (x  µ) dP2X =  log() +  E((X  E(X))2 )

log() +

=  log(2πσ2 exp(1))

(2.2) (2.3) (2.4) (2.5)

(2.6)

Complex Case:

Proof. We consider a circular symmetric complex Gaussian random variable Z . Then we know that Z = X + iY such that X and Y are both iid centered Gaussian Variable.  Then the differential entropy of Z is given by the joint differential entropy h(X, Y).  Since X and Y are independent random variable we have that:

h(Z) = h(X) + h(Y)                                                        (2.7)

Since X and Y are Gaussian Random variable we have that:

h(Z) = 2 ′  log(2πσ2 exp(1))                                                (2.8)

= log(2πσ2 exp(1))                                                          (2.9)

2.2    Second Question

Letnd the real distribution f that verify the following convex optimisation problem:

Minf        log(f (x))dPX

s.t:     dPX  = 1&     xdPX  = 0&     x dP2X  = σ 2

Proof.  The Lagrangian is given by:

=     log(f (x))dPX  + λ0 (    dPX   1) + λ1 (    xdPX ) + λ2 (    x dP2X   σ  )2 .

Then we derive according to f (y) Ay ,

(2.10) (2.11)

(2.12)

= 1 + log(f (y)) + λ0 + λ1 y + λ2 y2

δf (y)

Thus Ay ,

f (y) = exp(尸1 λ0 λ1 y  λ2 y  )2

Therefore the distribution that maximize the dierential entropy has the following form

f (x) = exp(尸1 λ0 ) exp(尸(λ1 y + λ2 y  ))2

(2.13)

(2.14)

(2.15)

Then solving the system of linear equation that respect the constraints we nd a special case of the exponential distribution: The Gaussian distribution with mean 0 and variance σ 2

3    Exercise

3.1    First Question

Show that

I(X, Y) = I(Y, X).                                                          (3.1)

Solution 1:

Since fX,Y  = fY,X ,

I(X, Y) = I(Y, X).                                                          (3.2)

Solution 2:

I(X, Y) = H(X) H(XlY) = H(Y) H(Y lX) = I(Y, X)                         (3.3)

3.2    Second Question

Show that

I(X, Y lZ) = I(X; Y, Z) I(X; Z).                                             (3.4)

We know that,

I(X, Y lZ) = H(X[Z) H(XlY, Z)                                            (3.5)

Then we know that,

I(X; Y, Z) = H(X) H(XlY, Z),                                              (3.6)

and

I(X, Z) = H(X) H(XlZ).                                                  (3.7)

Thus,

I(X; Y, Z) I(X, Z) = H(X) H(XlY, Z) (H(X) H(XlZ))    = H(XlZ) H(XlY, Z).     (3.8)

Therefore,

I(X, Y lZ) = I(X; Y, Z) I(X, Z).                                             (3.9)

4    Exercise

Show that the following application is a proper metric (i.e: respects the non-negativity, symmetry and triangular inequality and the identity of indiscernible)

D(X, Y) = H(X, Y) I(X; Y)                                                (4.1)

Solution:

Non-negativity:  The aim here is to prove:

D(X, Y) 0                                                               (4.2)

Proof.

D(X, Y) = H(X, Y) I(X, Y) = H(X, Y) (H(X, Y) H(XlY) H(Y lX))         (4.3)

= H(XlY) + H(Y lX)                                                                                      (4.4)

0                                                                                                                     (4.5)

Equation 4.4 is because the conditional entropy is non-negative.                                                     

Symmetry: We want to prove that:

D(X, Y) = D(Y, X).                                                        (4.6)

Proof.

D(X, Y) = H(XlY) + H(Y lX).                                               (4.7) 

Triangular inequality: We want to prove that:

D(X, Z) k D(X, Y) + D(Y, Z).                                               (4.8)

which is equivalent to:

H(XlZ) + H(ZlX) k H(XlY) + H(Y lX) + H(Y lZ) + H(ZlY).                    (4.9)

Thus it is necessary and sucient to prove that the triangular inequality if respected by the condi-

tional entropy:

H(XlZ) k H(XlY) + H(Y lZ).

Proof. We know that,

I(X, ZlY) = H(XlY) H(XlZ, Y) 0

which implies

H(XlY) H(XlZ, Y).

Thus,

H(XlY, Z) + H(Y lZ) k H(XlY) + H(Y lZ)

Then, since

H(XlY, Z) + H(Y lZ) = H(X, Y lZ)

we have,

H(X, Y lZ) k H(XlY) + H(Y lZ)

(4.10)

(4.11) (4.12) (4.13) (4.14)

(4.15)

and,

H(X, Y lZ) = H(XlZ) + H(Y lX, Z) H(XlZ),

we nd that,

H(XlZ) < H(XlY) + H(Y lZ).

(4.16)

(4.17)

identity of indiscernible:  We have to show that:

D(X, Y) = 0 X = Y,                                                    (4.18)

which is equivalent to prove that:

H(XlY) + H(Y lX) = 0 X = Y.                                           (4.19)

Proof. We know that H(XlY) = 0 if and only if X is a function of Y .  Then, H(Y lX) = 0 if and only if Y is a function of X . As a result,

H(XlY) = H(Y lX) = 0 If and only if X = Y

(4.20)

5    Exercice

We have to show that:  Given, xi  and xj  two eigenvectors of a symmetric matrix A, < xi , xj  >= 0, Ai, j

Proof.  Notice that by denition, the vector space spanned by the eigenvector of a matrix  A is

dened by:

Eλ  = ìxlAx = λx; .

Ai, j we have,

Axi  = λi xi

Axj  = λj xj .

Thus,

< x2 , Ax1  >= λi  < xj , xi  >

< xi , Axj  >= λj  < xi , xj  > .

Since A is symmetric (i.e: AT  = A),

< AT xj , xi  >=< Axj , xi  >= λi  < xj , xi  > .

Therefore,

(λi  λj ) < xi , xj  >= 0.

Ai, j  < xi , xj  >= 0

(5.1)

(5.2) (5.3)

(5.4) (5.5)

(5.6)

(5.7)

(5.8)

 

By drawing the plot for each channel, we conclude the dataset is ergodic since the averages converge to a constant for large values of n.