DSC 212 — Probability and Statistics for Data Science Lecture 6
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DSC 212 — Probability and Statistics for Data Science
January 26, 2023
Lecture 6
6.1 Recap: Convergence in Probability
A sequence of random variables Xn X if
→∞
Theorem 1 (Weak Law of Large Numbers (WLLN)). If Xi are i.i. d. random variables with mean EXi = µ, then
Xn = 工 Xi
(6.1)
Xn is a new sequence of random variables: Xn −→ µ, where µ is a constant.
Figure 6.1: Illustration of the distribution of Xn . It concentrates around the mean µ .
Note 1 . If gR → R is a measurable function, i.e., g(Xi) is a valid random variable, and Eg(Xi) = g ,
then we have 对 g(Xi) →− g .
6.2 Central Limit Theorem (CLT)
Definition 1 (Convergence in Distribution) . Let Xi be a sequence of random variables and let Fn be the CDF of Xn . Then Xn ⇒ X (converges in distribution to X) if limn→∞ Fn(t) = F(t) for all t at which F is continuous.
Note 2 . We assume nothing about Xi, except the existence of the mean and variance.
Remark 1 . For any continuously bounded function g: Eg(Xn) →− Eg(x)
Theorem 2 (Central Limit Theorem) . If Xi are i.i. d. random variables, EXi = µ , Var(Xi) = σ 2
X1 + X2 + . . . Xn
Xn(n) − EXn Xn − µ
then Zn ⇒ Z ∼ N(0, 1), i. e ., Zn converges in distribution to Z which has the Standard Normal
Distribution .
Remark 2 . Hence any statistic of Zn can be approximated by Z ,
Eg(Zn) → Eg(Z) = \ g(t)exp(−t2 /2)dt.
Figure 6.2: Deviations around the mean. We multiply by ^n to zoom into Figure 6.1. For any
P(average error per program ≤ 5.5) = P(Xn ≤ 5.5) = P(√n( ) ≤ √n( ))
≈ P(Z ≤ √n(5.5σ− µ )) = P(Z ≤ √125( 5. 5 ))
5.5 − 5
= FZ(2.5) = \− e dt = 0.9938
Conclusion 1 . With probability .9938, there are at most 55 errors on average.
Definition 2 . Dn: empirical average
P ( √N(σ(Dn) − µ) < √n(σ(t)− µ)) = .99
t ≥ µ + · F1 (0.99)
Example 2 . Toss an unbiased (P = ) coin 1000 times.
(1) What is the probability of seeing ≥ 600 Heads?
(2) E(total heads) = 500 Find a t, such that
total heads ∈ [500 − t,500 + t]
with probability 99%, 99.99% ?
1000
Xi ∼ Bernoulli(1/2) Y = 之 Xi ∼ Binomial (1000, 1/2). (6.2)
i=1
P(Y = k) = ( 1k(00)0 ) 2−1000
P({Y ≥ 600}) = 之(1000) ( 1k(00)0 ) 2−1000 (unwieldy)
P(Y ≥ 600) = P(Y/1000 ≥ 0.6) = P ) ≥ ) = P ( √1000 − 0.5 ≥ )
≈ P (Z ≥ ) = 1 − FZ(2 √10) ≈ 10 −10 .
(6.3) (6.4)
(6.5)
(6.6)
(6.7)
Remark 4 . FZ(−t) = 1 − FZ(t) for any symmetric PDF. See Figure 6.3 for an illustration.
− t/50 · ^10 ≤^1000 − 0.5 ) ≤ t/50^10
P(Y ∈ [500 − t,500 + t])
= P (Z ∈ [ − , ]) = P (z ≤ ) − P (z ≤ − ) = FZ ( t) − FZ ( − ) = 2FZ ( ) − 1
FZ ( − t) = 1 − FZ ( )
(6.8) (6.9) (6.10)
(6.11)
(6.12)
(6.13)
Figure 6.3: Probability of distribution around the origin
2FZ ( ) − 1 = 0.99 or 0.9999
FZ ( ) = 0.995 or 0.99995
t = ^10 · F1 (0.995) or ^10F1 (0.99995)
6.3 Delta Method
Suppose ^n ( ) ⇒ N(0, 1). Let g be a differentiable function with g\ (µ) 0. Then,
^n =⇒ N(0, 1). (6.14)
2023-02-25