闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DSC 212 — Probability and Statistics for Data Science

January 24, 2023

Lecture 5

5.1 IID random variables

Definition 1 (Independent and Identically Distributed (i.i.d.) random variable) . If random vari- ables X1 , ...Xn are independent and have the same marginal distribution, then they are referred to as i.i.d.

FXi (t) = FX1 (t) Ai

Example 1. X1 , ...Xn are i.i.d. Bernoulli random variables with parameter p. For Bernoulli

given by

PX1 ({X1 = t}) = pt(1 − p)(1−t) where t ∈ {0, 1}

Due to the i.i.d. property, the joint distribution is the following probability,

n n

P({X1 = t1 ,X2 = t2 , ...,Xn = tn}) =”P({Xi = ti}) =” pti(1 − p)1−ti

i=1 i=1

= p对 ti (1 − p)对(1−ti) = p对 ti (1 − p)n−对 ti

Property 1. For i.i.d. random variables Xi, where each Xihas variance σ2 , i.e. EXi(2)−(EXi)2 = σ 2 , Var(z Xi) = n · σ 2 ,

Proof. First define, Y = z Xi and let EX = µ, the variance Var(z Xi) can be expressed as EY2 − (EY)2 . Let us consider each of the terms separately. The first term can be expressed as,

EY2 = E(X1 + X2 + ... + Xn)(X1 + X2 + ... + Xn)

= E(X1(2) + X2(2) + ... + Xn(2) + 2X1X2+ 2X1X3 + ...2X1Xn+ ...)

之 EXi(2)

i=1

" 物

之

independent, ij

identically distributed

= nEX1(2) + 之 2 EXiEXj

" 物

identically distributed

= nEX1(2) + 2 ( )2(n) EX1EX1

= nEX1(2) + n(n − 1)(EX1)2

The second term (EY)2 can be written as,

(EY)2 = (E(X1 + X2 + ... + Xn))2 = (EX1 + .... + EXn)2 = (nEX1)2

(by linearity) (by identically distributed)

Finally, the variance is

Var(Y) = nEX1(2) + n(n − 1)(EX1)2 − n2 (EX1)2

= nEX1(2) + 2 一2− n(EX1)2 − 2 一2

= n(EX1(2) − (EX1)2 )

= n · Var(X1)

Thus, the variance of X1 +X2 + ... +Xn is n times the variance of Xi if X1,X2 , . . . ,Xn are i.i.d.

5.2 Probability Inequalities

Theorem 1 (Markov’s inequality). For any non-negative random variable X, suppose that EX exists . For any t > 0,

P(X ≥ t) ≤

Proof.

EX = \0 ∞ ufx(u)du = \0t ufx(u)du+ \t ∞ ufx(u)du

石、

positive

≥ \t ∞ ufx(u)du ≥ t \t ∞ fx(u)du = t · P(X ≥ t)

Theorem 2 (Chebyshev’s inequality). For random variable X where EX2 < ∞ and let EX = µ ,

then

Var(X)

Proof. Let Y = |X − µ|. Clearly Y is non-negative, whereby applying Markov’s inequality we get,

P(|X − µ| ≥ t) = P(Y ≥ t) = P(Y2 ≥ t2 ) ≤ = =

Remark 1 . Thus, P({X > µ + 3σ} or {X < µ − 3σ}) = P(|X − µ| ≥ 3σ) ≤ =

Another useful inequality is the following. Let Y = X1 +X2 + ... +Xn where X1,X2 , ...,Xn are i.i.d. random variables. Thus, EY = nEX1 ,

Var(Y) n · Var(X1)

t2 t2

By setting t = nu, we obtain the following

P(' − EX1 ' > u) ≤ =

which implies that the difference between the average of i.i.d. random variables and the expectation goes to zero as n goes to infty, i.e. limn→∞P(' − EX1' > u) → 0. This is known as the weak law of large numbers and is formally defined later.

Definition 2 (Convergence in Probability). A sequence of random variables X1 ...Xn(i.i.d. assump- tion is not required) is said to converge in probability to random variable X, written as Xn X , if

lim P(|Xn− X| ≥ t) = 0

Theorem 3 (Weak Law of Large numbers). If X1 ...Xn are i.i. d. random variables each with expected value µ, then

对 Xi µ

Remark 2 . This justifies why we refer to the expected value as the mean.

In practice however, we often only have finite number of samples, whereby we have to rely on probability inequalities to make statements with high confidence.

Example 2. Consider a tossing a coin, P({H}) = p. Define a sequence of random variables Xi

as Xi = t(t)h(h) . Define, Yn = X1 + X2 + ...Xn (the total number of heads

We want to know if the coin is biased or not.

Using the weak law of large numbers, EX1 = P(X1 = 1) = p.

However, tossing infinite number of coins is not feasible. What if we want to know the answer for a smaller number of tosses, e.g. for n = 100?

Here ∈ [0, 1] is random variable and P(' − p ' > t) ≤ (by Chebyshev’s inequality)

Var(X1) = EX1(2) −(EX1)2 = p − p2 = p(1 − p)

石如

Xi(2) =X1

Given p = 0.7 and we want an answer that is within an interval. For example, for t = 0.1

P(' − p ' > 0.1) ≤ = 0.21. Which implies that with probability at most .21, the fraction

of heads lies in the interval [0.6, 0.8].

If we are willing to increase the size of the interval, we can be more confident, since

P(' − p ' > 0.15) ≤ = 0.21/2.25 = 0.0934.

The above statement means, we probability ≥ 91%, we can say the average fraction lies in [0.55, 0.85].

More generally, for some α ∈ [0, 1], with probability at least 1 − α, the fraction of Heads will

lie in the interval [p − ^ p),p + ^ ].

Substituting p = 0.5,n = 100,α = .09, we get ^ p),p + ^ ≈ 0.17 if we observe fewer

than 0.33 fraction of heads or more than 0.67 fraction of heads, we can conclude that the coin is biased with probability 91%.

If on the other hand we had n = 10000 coin tosses, we can conclude that the coin is biased with probability 91%, if we observe fewer than 0.483 fraction of heads or more than 0.517.

A tight confidence interval can be constructed with more specialized inequalities such as Ho- effding’s inequality.

2023-02-23

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple