Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DSC 212 — Probability and Statistics for Data Science

January 19, 2023

Lecture 4

4.1    Independence of Random Variables

In the last class we defined independence between discrete random variables. We say X ⊥⊥ Y if PXY({X = u} ∩ {Y = v}) = PX ({X = u}) PY ({Y = v})

Let us quickly verify the types of all the objects in the equation above.  The probability space on the LHS is  (RX  × RY , B(RX  × RY), PXY) whereas on the right it is  (RX , B(RX), PX) and (RY , B(RY), PY), where B(S) is the Borel field of a space S .  These individual probability spaces have themselves been defined based on some other underlying probability spaces (Ω 1 , F1 , P1) and (Ω2 , F2 , P2), and random variables X  : Ω1  → RX   and Y  : Ω2  → RY .  Indeed independence is really about the interaction between the randomness of outcomes in Ω 1  and Ω2 , given by a joint distribution P12  over events from Ω1  × Ω2 .

In general, the two random variables are independent if their joint CDF can be written as the product of their marginal CDFs.

FXY(u,v) = FX(u) · FY(v)                                               (4.1)

where FXY(u,v) = PXY({X ≤ u}∩{Y ≤ v}) = P12 ({(ω1 ,ω2 ) ∈ Ω1  × Ω2  | X(ω1 ) ≤ u,Y (ω2 ) ≤ v}), and FX(u) = PX({X ≤ u}), whereas FY(v) = PY({Y ≤ u}).

Recall that for a pair of random variables (X,Y), their distribution function is described by their CDF,

FXY(u,v) = PXY (X ≤ u,Y ≤ v)                                           (4.2) For continuous random variables, there exists a more convenient representation than the CDF.

4.1.1    Joint Density

(X,Y) are jointly continuous random variables if there exists a function fXY  : R2  → R, called the joint probability density function (joint-PDF), such that

The PDF must

PXY(A) = \A fXY (u,v)dudv

satisfy,

fXY (X,Y) ≥ 0

\ \ fXY (u,v)dudv = 1

(4.3)

(non-negativity)

(normalization)

4.1.2    Marginal Density

Based on the joint PDF, one can derive the marginal density of X and Y , denoted fX  and fY respectively.

\

fY (v) = \ fXY (t,v)dt

If the marginal density (fX  and fY ) is known, the joint density fXY  is unknown, unless X and Y are independent.

For conditnuous random variables, one can verify that, if X,Y are independent, then

fXY  = fX (X)fY (Y)

Suppose X,Y are independent, then Eg(X)h(Y) = Eg(X)Eh(Y)

\

 \ \ g(u)h(t) fX (u)fY (t)dudt

= \ g(u)fX (u)du · \ h(t)fY (X)dt

= Eg(X) · Eh(Y)

where (c) follows from the indepedence of X and Y also allows for fXY may be written as the product of fX  and fY  .

4.2    Transformations of Random Variables

Given a random variable X we are often interested in the distribution of Y = g(X) for some function g .  To find the distribution of Y , we must find its CDF, and then find its PDF by differentiating the CDF.

Example 1. Let’s assume X is drawn from a random uniform distribution and transformed using Y = sin(X). Suppose

X ∼ Uniform [  π , ]

Y = sin(X)

CDF: Let us find the probability that Y ≤ t, for some t ∈ [ − 1, 1].  Consider the following set of

 

Figure 4.1: (left) The CDF and PDF of X . (right) The Transformation Y = sin(X).

equations.

FY(t) = P{Y ≤ t}

FY(t) = P{sin(X) ≤ t}

FY(t)  P{X ≤ sin−1(t)} = FX (sin−1(t))

u  

π

FY(t) = FX (sin−1(t)) = 

where (b) holds since the sin function is increasing in this domain.

PDF: Let us take the derivate of FY(t), which we found above, to find the PDF, fY (t).

FY (t)       ∂  sin1(t)       1           1

fY (t) =     ∂t     = ∂t      π       = π · ^1 − t2

4.2.1    General invertible increasing transformations

Assume g is an invertible and increasing function: y = g(X)

FY(t) = FX (g−1(t))

fY (t) =   fX (g−1(t)) g−1(t) = fX (g−1(t)) 

where (a) follows due to the chain rule of differentiation. This general form may be used to verify the above solution of the PDF of y = sin(X), as shown below.

4.3    Conditional Distributions

For a pair of random variables (X,Y) with a joint distribution PXY , the conditional distribution of X | Y is defined as

PXY({X u} ∩ {Y v})

PY({Y = v})

As in the case of sets, the above equation is well defined only if {Y = v} does not have measure 0, i.e., PY ({Y = v}) > 0.

For each v, the LHS is a distribution of X .  Hence v is a parameter of this distribution.  The

axioms of probability only hold for the first argument of a conditional distribution. For continuous random variables (X,Y) with density fXY , the conditional density of X | Y is

fXY (u,v)

fY (v)

is a valid density function with respect to u. Note that the above function is well defined only for v such that fY (v) > 0

Notice that if X TT Y , then

PX|Y({X = u}|{Y = v}) = PX({x = u})

fX|Y(u|v) = fX (u)

Example 2 (Max transformation). Given two random, uniform distributions (X and Y), find the CDF and PDF of the maximum of the joint distribution (Z).

X ∼ Uniform[0, 1]

Y ∼ Uniform[0, 1]

Z = max{X,Y }

CDF: Let us find the probability of the event Z ≤ t.  This means that both X and Y are ≤ t. Since X and Y are independent, the joint distribution can be decomposed as a product of the two marginal distributions. Hence we can write

FZ  = P{max{X,Y } ≤ t}

= PXY{{X ≤ t} ∩ {Y ≤ t}}

= PX{X ≤ t} · PY{Y ≤ t} = t · t

= FX(t) · FY(t) = t2

PDF: The above equation yields that,

 

 

Figure 4.2:  The joint distribution of two random uniform variables (left).  The PDF for the two random uniform distributions and maximum transformation (right).

4.4    Some commonly used expectations

4.4.1    Mean

The mean or expectation of X is a constant.  It is not random.  It is the best1  estimate for what the random variables distribution looks like. We often denote the mean of a random variable X by µX . Recall that by definition of the expectation, the mean is given by the integral

EX = \ tfX (t)dt

where fX  is the PDF of X .

4.4.2    Variance

The variance of a random variable X describes the average deviation around the mean, and is denoted σX(2), since it is non-negative.

σX(2)  = E[(X − EX)2] = E[(X − µX )2] = E|X|2 − |EX|2

Observe that (X − EX)2  is always non-negative, whereby σX(2)  is also non-negative. Hence,

E|X|2  ≥ |EX|2 .                                                            (4.4)

The square-root of the variance, i.e., σX  is called the standard deviation of X, i.e., the deviation about the mean, which can be considered to be a “standard”.

4.4.3    Covariance and Correlation

The covariance is defined below.

Cov(X,Y) = E(X − µX )(Y − µY ) = E(XY − XµY  − µXY + µX µY ) = E(YX) − µX µY

The inner terms cancel since E[XµY ] = E[µXY] = µX µY . When we want to study the interaction between a pair of random variables (X,Y), we define the covariance to be the matrix:

ΣXY  = [E(X (Y(X)) EY)   E(X Y(Y)) EY)] = [ρ2σXσXσY

where − 1 ≤ ρ ≤ 1 is called the correlation.

ρσX2σYσY ]

E[(X − µX )(Y − µY )]

(4.5)

The matrix above is also sometimes referred to as a Variance- Covariance matrix, because it contains the variance along the diagonal and covariances off the diagonals.

In terms of vector notation. The covariance matrix for a random vector X ∈ Rd  is the matrix

Σ = E(X − EX)(X − EX)⊤  = EXX⊤  − (EX)(EX)⊤

We also of have the inequality

Σ ⪰ 0, ⇐⇒  EXX⊤  ⪰ (EX)(EX)⊤

(4.6)

(4.7)

which means Σ is a positive definite matrix and v⊤ EXX⊤v ≥  (Ev⊤ X)2 , for any constant vector v ∈ Rd . Notice that both quantities are scalars in the last inequality.

Exercise 1. Find the range of ρ .

Solution: The Cauchy-Schwarz inequality states:

|⟨a,b⟩|2  ≤ ⟨a,a⟩⟨b,b⟩

|⟨a,b⟩| ≤ ∥a∥ · ∥b∥

Let:

a := X − µX

b := Y − µY

⟨a,b⟩ := Eab

The expectation of the above terms may be used in the Cauchy-Swartz inequality.  The last step below follows from the definition of the correlation coefficient, ρ .

|E[(X − µX )(y − µY )]|2  ≤ E[(X − µX )2]E[(Y − µY )2]

|Cov(X,Y )| ≤ σX σY

Cov(X,Y)

4.5    Conditional Expectation

How to take expectation given a set of conditions.

E[g(X)|Y = v] = EX|Y=vg(X) =     g(t)fX|Y(t|v)dt

R

The above integral is taken with respect to t, thus the expectation is dependent on v . Conditional expectation E[g(X)|Y] is a random variable.

Claim  1.  For a pair of random variables (X,Y), we have

EY[EX|Y[g(X)|Y]] = EYh(Y) = EX|Yg(X)

Proof.

E[g(X)|Y = v] = h(v)

Ey[h(Y)] = \RY = \RY = \RY = \RY

h(v) · fY (v)dv

(\RX  f(t)fX|Y (t,v)dtdv) · fY (v)dv

\

g(t)dt ·    fXY (t,v)dv

RX

→− Let fX (t) = \RX  fXY (t,v)dv →− marginal of X

=       g(t)fX (t)dt →− constant

= Eg(X)