关键词 > MTHE-MATH474/874

MTHE - MATH 474/874 - Information Theory Fall 2021

发布时间：2022-01-13

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MTHE - MATH 474/874 - Information Theory

Fall 2021

1. Decide whether each of the following statements is true or false. Prove the validity of those that are true and give counterexamples or arguments based on known facts to disprove those that are false.

(a) Let X and Y be two jointly distributed discrete random variables with alphabet x and y, respectively. Deﬁne the random variable Z by Z = g(X), where g(.) is an arbitrary function with domain x . Then H(Y lX) 5 H(Y lZ).

(b) Every Huﬀman code for a discrete memoryless source (DMS) has a corresponding suﬃx code with the same average code rate.

(d) There exists a uniformly distributed random variable X with diﬀerential entropy h(X) = -3 bits.

(e) Random coding is a method of proving the existence of codes with certain desirable properties.

(f) The discrete entropy function H(X) of a random variable X with ﬁnite alphabet x and pmf p(.) is concave in p(.).

(g) The diﬀerential entropy h(X) is invariant under invertible transformations; i.e., if g is invertible then h(X) = h(g(X)).

(h) The rate of an n-th order Huﬀman code for a DMS nXi }1(o) is arbitrarily close to the source entropy H(X) iﬀ n - o.

(i) Among all probability density functions with support the interval [a, b], the uniform density function has the largest diﬀerential entropy.

(j) The capacity of a DMC with ﬁnite input and output alphabets is always bounded from above.

(k) Suppose that random variables X , Y and Z are jointly Gaussian, each with mean 0 and variance 1. Assume that X - Y - Z and that E[XY] = ρ, where 0 < ρ < 1. Then

I(X; Z) > log2 ┌ ! .

(l) Let random variable X be uniformly distributed over the interval [0, 1], and deﬁne random variable Y by Y = 2X + 2. Then h(Y) < 0.

(m) For a stationary source nXn }, its normalized joint entropy H(X1 , . . . , Xn ) is nonincreasing in n.

(o) The log-sum inequality can be proven using Jensen’s inequality.

(p) For a time-invariant Markov chain nXi } with ﬁnite alphabet x = n1, 2, . . . , m}, transition matrix Q = [Px2 |x1 ] and stationary distribution vector π = (π1 , . . . , πm ), if Px1 = π, then

D(Pxn lPxn − 1 ) > 0 Vn > 2.

(q) Given a positive integer m, let 0 < pi , qi < 1 with pi qi i = 1, 2, . . . , m. If Lpi = L qi = 1, then

L qi(太) pi(1) -太 < 1

i=1

for any ﬁxed parameter 0 < λ < 1.

(r) For a uniformly distributed memoryless source with alphabet x = na1 , a2 , . . . , a9 }, it is not possible to design a ternary ﬁrst-order Huﬀman code c with length variance Var(l(c)) = 0.

(s) Let X1 , X2 , . . . , Xn be inputs to a discrete memoryless channel (x , y, PY |x), with ﬁnite input alphabet x and ﬁnite output alphabet y, and let Y1 , Y2 , . . . , Yn be the corresponding outputs. Then it is possible to construct the channel’s transition distribution PY |x and an input distribution Pxn such that the inputs X1 , X2 , . . . , Xn are dependent while the corresponding outputs Y1 , Y2 , . . . , Yn are independent.

(t) If X and Y are real-valued independent random variables, then

h(-4X - 3Y - 10) > h(X) + 2 (in bits).

2. Answer the following questions.

(a) Let random variables X and Y have Gaussian joint distribution, with E[X] = 0, E[Y] = 0, E[X2] = σ 1(2) , E[Y2] = σ2(2), and E[XY] = σ 1 σ2 ρ, where 0 5 ρ < 1. Find I(X; Y) in terms of ρ. For what value of ρ are X and Y independent?

(b) A discrete-time memoryless Gaussian channel has equal input and noise powers. Find the capacity of this channel in bits/channel use and explain its operational signiﬁcance.

3. The lognormal density of a positive random variable X is given by

1 (ln z −u)2

fx (x) = σx′2πe- 2g2 , x > 0, -o < µ < +o, σ > 0.

Compute the diﬀerential entropy h(X) in nats.

4. (a) State the Data Processing Theorem and prove it.

(b) The output of a discrete memoryless channel K1 is connected to the input of another discrete memoryless channel K2 . Show that the capacity of the cascade combination can never exceed the capacity of Ki , i = 1, 2.

5. Let X , Y and Z denote three discrete random variables. Show that

H(XlY) + H(Y lZ) > H(XlZ).

6. Consider a discrete-time binary stationary Markov source nU~ } with transition prob- ability

PrnU~+1 = jlU~ = i} =

(a) Compute the source entropy rate H(u).

(b) Is it possible to reliably transmit this Markov source via a rate-one source-channel code across a discrete memoryless channel with capacity C = 0.5 bits/channel use? Explain qualitatively.

7. Let X be a binary random variable with alphabet x = n0, 1}. Let Z denote another random variable that is independent of X and taking values in z = n0, 1, 2, 3} such that PrnZ = 0} = PrnZ = 1} = PrnZ = 2} = ∈, where 0 < ∈ < 1. Consider a discrete memoryless channel with input X, noise Z, and output Y described by the equation:

Y = 3X + (-1)x Z,

where X and Z are as deﬁned above.

(a) Determine the channel transition probability matrix Q := [p(ylx)], and draw the channel transition diagram.

(b) Compute the capacity C of this channel in terms of ∈. What is the maximizing input distribution that achieves capacity?

(c) For what value of ∈ is the noise entropy H(Z) maximized? What is the value of C for this choice of ∈? Comment on the result.

8. (a) Find the entropy rate in bits per source symbol of the binary stationary Markov chain with transition matrix

P = ! .

(b) Design a second-order binary Shannon code for the source. Is it optimal?

(c) For what values of p, can you transmit reliably the source via a rate-2 source-channel code over a channel with the following matrix

Q = ┌ 1 p 1 p ! .

9. Let nXi } be a stationary source with alphabet x . Deﬁne a new source nYi } with alphabet y by Yi = f (Xi ) where f (.) is a given function. Compare the entropy rates of the two sources – i.e., ﬁnd the relationship between H(x) and H(y).

10. Inequalities. Which of the following inequalities are >, =, 5? Label each with >, =, or

5 and justify your answer.

(a) I(g(X); Y) versus I(X; Y) where g(.) is a given function.

(b) H(X, Y)/(H(X) + H(Y)) versus 1.

(d) H(X3 lX2 ) versus limn→o H(Xn lXn-1 , . . . , X1 ) for a stationary Markov source nXn } . (e) h(4X + 2) versus h(X) for a real-valued random variable X .

11. Binary-Input Channel with Erasures and Errors. Consider the following two-input, three- output discrete memoryless channel

0 \一p\ 0

1 /p///g(/)

1 − p − g

where p e (0, 1), q e (0, 1) and p + q e (0, 1).

(a) Find the capacity C of this channel in terms of p and q. What is the maximizing

input distribution that achieves capacity?

(b) If p = q = 1/3, ﬁnd the value of C . Comment on the result.

12. (a) Fading channel. Consider an additive noise fading channel described by: Y = XV + Z,

where Y is the channel output, X is the channel input, V is a random variable representing fading, and Z is additive noise. Assume that X , Z and V are all independent of each other. Argue that knowledge of the fading factor V improves capacity by showing that

I(X; Y lV) > I(X; Y).

(b) Incidentally, note that conditioning does not always increase mutual information.

Give an example of jointly distribution random variables U , R and S such that I(U ; RlS) < I(U ; R).

13. Prove by using Jensen’s inequality that the divergence D(plq) > 0, where p(.) and q(.) are two probability mass functions deﬁned on the same alphabet.

14. The Sum Channel.

(a) Let (x1 , y1 , p1 (ylx)) be a discrete memoryless channel (DMC) with ﬁnite input alphabet x1 , ﬁnite output alphabet y1 , transition distribution p1 (ylx) and capacity C1 . Similarly, let (x2 , y2 , p2 (ylx)) be another DMC with capacity C2 . Assume that x1 ∩ x2 = 0 and that y1 ∩ y2 = 0.

Now let (x , y, p(ylx)) be the sum of these two channels where x = x1 u x2 ,

y = y1 u y2 and

! p1 (ylx)

p(ylx) = ì p2 (ylx)

( 0

if x e x1 , y e y1

if x e x2 , y e y2

otherwise.

Show that the capacity of the sum channel is given by

Csum = log2 ┌ 2b1 + 2b2 ┐

bits/channel use.

(b) Compute Csum above if the ﬁrst channel is a binary symmetric channel with crossover probability 0.11, and the second channel is a binary erasure channel with erasure probability 0.5.

15. Given a ﬁxed positive integer n > 1, consider an n-ary valued random variable X with

alphabet x = n1, 2, . . . , n} and distribution described by the probabilities pi := PrnX = i}, where pi > 0 for each i = 1, . . . , n. Given α > 0 and α 1, deﬁne the R´enyi entropy

of X as

Hà (X) := log2 - pi(à); .

(a) Show that

L pi(r) > 1

i=1

and that

L pi(r) < 1

i=1

if r < 1,

if r > 1,

[Hint: Show that the function f (r) =Lpi(r) is decreasing in r, where r > 0.]

(b) Show that

0 5 Hà (X) 5 log2 n.

[Hint: Use (a) for the lower bound, and use Jensen’s inequality (with the convex

function f (y) = y 1 −o , for y > 0) for the upper bound.]

16. Answer the following:

(a) State the channel coding theorem.

(b) Discuss brieﬂy two main tools used in proving the forward part of the channel coding theorem.

17. State the lossless joint source-channel coding theorem for a memoryless source-channel pair, and explain why it can be used to justify the separate design of the source and channel coding operations in a communication system.

18. Let X be a discrete random variable with alphabet x and distribution p(x) . Let f : x - R be a real-valued function, and let α be an arbitrary real number.

(a) Show that

H(X) 5 αE[f (X))] + log2 -Lzex 2-àf(z); ,

with equality iﬀ p(x) = 2-àf(z), where A :=Lzex 2-àf(z) .

(b) Show that for a positive integer-valued random variable N, the following hold: H(N) 5 log2 (E[N]) + log2 e.

Hint: First use part (a) with f(N) = N and α = log2 ; then use the fact that L an = if lal < 1 to obtain that

E[N] E[N]

E[N] - 1 E[N] - 1

and apply the fundamental inequality.

19. State the Asymptotic Equipartition Property (AEP) for a discrete memoryless source and explain its signiﬁcance vis-a-vis the ﬁxed-length source coding theorem.

20. Binary divergence. Show that the binary divergence deﬁned by

Db (plq) := p ln ╱ 、 + (1 - p) ln ╱ 、 ,

where 0 < p < 1, 0 < q < 1 and ln(.) denotes the natural logarithm, satisﬁes

(p - q)2

q(1 - q) .

Give a suﬃcient and necessary condition for equality.

21. Conditioning increases divergence. Given random variables X and Xˆ with common alphabet x , and random variable Z with alphabet z, show that

D(Px|Z llPxˆ |Z) > D(Px llPxˆ),

where

D(Px|ZllPxˆ |Z) :=L L PxAZ (x, z) log2 Px|Z(xlz)

and

D(Px llPxˆ) :=L Px (x) log2 Px (x)

22. Let nXn } be a discrete memoryless source (i.i.d.) with alphabet

x = na, b, c, d, e, f, g, h}

and probability mass function

PrnXn = a} = 1/4, PrnXn = b} = 1/4,

PrnXn = c} = 1/8, PrnXn = d} = 1/8,

PrnXn = e} = 1/16, PrnXn = f } = 1/16,

PrnXn = g} = 1/16, PrnXn = h} = 1/16.

(a) Compute the source entropy rate H(x). Provide the correct unit. (b) Compute the source redundancies ρ$ , ρd and ρ太 .

source. Determine the average codeword length l for this code. How does l compare

to H(x)?

(d) By observing the results in part (c), can you deduce the average rate R of an optimal binary 50th-order preﬁx code for the source? Explain your answer.

23. Let p(.) and q(.) be two diﬀerent probability distributions on the alphabet

x = na1 , . . . , a8 }

such that q(ai ) > 0 for all i = 1, . . . , K. Show that

> 1.

24. Let X and Y be jointly distributed discrete random variables. Show that

I(X; Y) > I(f (X); g(Y)),

where f (.) and g(.) are deterministic functions.

25. Converse of the Variable-Length Lossless Coding Theorem for Stationary Sources: Con- sider a stationary source nXi } with ﬁnite alphabet x and entropy rate H(x) (in bits). Show that any binary n-order uniquely decodable variable-length code fn : xn - n0, 1}* for the source has an average code rate satisfying

> H(x).

26. Data Storage Channel. Consider a discrete memoryless channel with input alphabet x = n0, 1, . . . , q - 1} and output alphabet y = n0, 1, . . . , q - 1, e} where q > 2 is an integer that is a power of 2 and symbol e denotes an erasure (e x). The channel, which is corrupted by a noise-erasure variable Z e n0, 1, e} that is independent of the input X, models storage devices where the input data X is stored in binary form, using a natural binary code (NBC) of length log2 q bits. When Z = e, an erasure occurs. In the non-erasure mode, if Z = 0, no error occurs and the storage device returns the input X perfectly, and if Z = 1, a hard failure occurs causing the storage device to ﬂip all the NBC bits representing X, which is equivalent to the operation q - 1 - X . More precisely, the channel output Y is described as follows:

e if Z = e

Y = ìX if Z = 0

q - 1 - X if Z = 1

where P [Z = e] = α and P [Z = 1] = ∈ with 0 < α + ∈ < 1.

(a) When q = 4, determine the channel transition probability matrix Q = [PY |x] and determine its capacity C in terms of α and ∈ .

(b) For general q > 2, determine the channel transition probability matrix Q = [PY |x] and determine its capacity C in terms of q , α and ∈ .

27. Consider a discrete-time memoryless additive Gaussian noise channel with input power constraint P > 0 and noise power (variance) N > 0. Let C = max长x:F[x2 ]5P I(X; Y) denote the channel capacity, where the maximization of the input-output mutual infor- mation is taken over all input distributions Fx satisfying the second-moment (power) constraint.

(a) Solve analytically the maximization in C and determine its resulting expression in terms of P and N .

(b) Looking at capacity C = C(P) as a function of the input power P (for a ﬁxed noise

power N), show that

C │ L(n)αi Pi ← >

L αi C(Pi )

i=1

for any given Pi > 0 and 0 < αi < 1, i = 1, . . . , n such that L αi = 1.

28. State and prove Fano’s inequality.

29. State and prove the necessary conditions of optimality for a binary preﬁx code.

30. Consider a discrete memoryless source nXi } with alphabet x and generic distribution p(.). Let c = f (x) be a uniquely decodable binary code

f : x - n0, 1}*

that maps single source letters onto binary strings such that its average code rate Rc

satisﬁes

Rc = H(X) bits/source symbol.

In other words, c is absolutely optimal.

Now consider a second binary code c/ = f/ (xn ) for the source that maps source n-tuples onto binary strings:

f/ : xn - n0, 1}* .

Provide a construction for the map f/ such that the code c/ is also absolutely optimal.

31. Consider two zero-mean Gaussian random variables X1 and X2 with (positive) variances P1 and P2 , respectively: X1 ~ N(0, P1 ) and X2 ~ N(0, P2 ). Let

Y1 = X1 + Z and Y2 = X2 + Z

where Z ~ N(0, σ2 ) is a zero-mean Gaussian random variable with (positive) variance σ 2 and is independent from both X1 and X2 .

(a) Determine in nats D(X1 lX2 ) and D(Y1 lY2 ) in terms of P1 , P2 and σ 2 .

(b) Compare D(X1 lX2 ) to D(Y1 lY2 ) and comment qualitatively.

32. Consider a discrete memoryless channel with input alphabet x = n0, 1, . . . , q, q + 1, q + 2, q + 3},

output alphabet

y = n0, 1, . . . , q, q + 1, q + 2, q + 3, q + 4}, where q > 2 is a ﬁxed integer, and transition distribution PY |x given by

1 - α if y = x and x e n0, 1, . . . , q - 1}

α if y = q and x e n0, 1, . . . , q - 1}

PY |x(ylx) =

∈ if y = x and x e nq + 1, q + 2, q + 3}

where 0 5 α, ∈ 5 1.

(a) Find the capacity C of the channel (in bits) in terms of α , ∈ and q .

(b) Determine the values of α and ∈ that yield the largest possible value of C.

(c) We wish to transmit an arbitrary source nUi } with memory via a rate-one source- channel code over this channel with parameters α and ∈ obtained in part (b). What is the largest possible size of the source alphabet for which the source can be reliably sent of the channel? Describe the rate-one source-channel code which can achieve such communication.

33. Given a ﬁxed positive integer n > 1, consider two probability distributions p = (p1 , . . . , pn ) and q = (q1 , . . . , qn ) with support x = n1, 2, . . . , n}; i.e. pi , qi > 0 for i = 1, . . . , n and Lpi = L qi = 1. Given α > 0 and α 1, the R´enyi cross-entropy between p and

q of order α is given by

Hà (p; q) := log2 │ pi qi(à) -1 ← .

(a) Prove whether or not Hà (p; q) is non-increasing in α .

(b) Determine limà→ 1 Hà (p; q).