Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DSC 212 — Probability and Statistics for Data Science

Lecture 1

January 10, 2023

Probability is the language of uncertainty.  We will formally introduce mathematical objects needed to define probability rigorously.

1.1    Axioms of Probability

Definition 1. A sample space  (Ω) is the set of all possible outcomes of an experiment.

Definition 2. An event A ⊂ Ω is a set of outcomes. A collection of events F is called a σ−algebra (or σ − field) if it satisfies:

1. If a event A ∈ F, then Ac  ∈ F,

2. If a sequence of sets {Ai} are in F, then their union iAi  ∈ F.

A tuple (Ω , F) with a sample space and an associated σ−algebra F are together called a measurable space. If F is a σ−algebra, we refer to it as an event space since it consists of all events whose probability we may be interested in.

Definition 3. A measure is a function P : F '→ R that assigns values to events and satisfies,

1. Non-negativity: P(A) ≥ P(∅) = 0

2. Additivity: If Ai  ∈ F are disjoint, then P(∪iAi) =对iP(Ai).

Definition 4. A measure P is a probability measure if it satisfies a normalization condition P(Ω) = 1.

Together (Ω , F, P) is called a Probability space.

Example  1. Consider the roll of an ancient egyptian astragali dice made from the ankle bone of a sheep.  The sample space of possible outcomes is Ω= {1, 3, 4, 6}, based on points assigned to different sides of the bone. One can verify that the following set is a σ−algebra:

F = {{1, 6} , {3} , {4} , {1, 3, 6} , {1, 4, 6} , {3, 4} , Ω , ∅} .

A probability measure P  : F [0, 1] which satisfies P({3}) =  , P({4}) =   defines a unique probability measure. In fact, if P({3}) = p3  and P({4}) = p4 . Then each point in the following set {(p3 ,p4 ) | p3 ≥ 0,p4 ≥ 0,p3 + p4 ≤ 1} defines a unique probability measure on (Ω , F).

Note  1.  A subtle point here is that P(3) is undefined because the function P only accepts elements of F as inputs. We often mean P({3}) when we say P(3), but this is abuse of notation.

Note  2.  Observe that P({1}) is undefined since {1}  F, the domain of the measure function P. To talk about events of the form {1} , {6} , {1, 3} , . . . and other sets in12  2− F, one would have to define a bigger σ − algebra.

1.2    Properties of a probability measure

1. P(Ac ) = 1 − P(A), for any event A ⊂ Ω .

Proof.  Observe that Ω = A ∪ Ac  for any set A ⊂ Ω . Furthermore, A and Ac  are disjoint. Hence using the additivity property we have

P(Ω) = P(A ∪ Ac) = P(A) + P(Ac) = 1

where the last equality follows by the normalization property. This proves the claim.         

2.  (Inclusion-exclusion) P(A ∪ B) = P(A) + P(B) − P(A ∩ B), for any event A,B ⊂ Ω .

Proof.  We know A − B and A ∩ B are disjoint sets, and the union of these sets is A, by applying applying additivity we get

P(A) = P(A − B) + P(A ∩ B).                                           (1.1)

Similarly, B − A and A ∩ B are disjoint sets and the union of these sets is B, whereby P(B) = P(B − A) + P(A ∩ B).

Finally, consider A − B , B − A, and A ∩ B . These 3 sets are disjoint, and the union of these sets is A ∪ B . From the additivity property again, we get,

P(A ∪ B) = P((A − B) ∪ (B − A) ∪ (A ∩ B)) = P(A − B) + P(B − A) − P(A ∩ B).

The claim follows immediately by plugging in the above two equations.                             

3. P(A ∪ B) ≤ P(A) + P(B), for any event A,B ⊂ Ω .

Proof.  This follows immediately from the previous claim since P(A ∩ B) ≥ 0.                      

1.3    Independent events

Events A and B are independent, denoted by A ⊥⊥ B, if

P(A ∩ B) = P(A)P(B)

Similary Ac  ⊥⊥ B and Ac  ⊥⊥ Bc . Similarly events A,B,C are independent if

P(A ∩ B ∩ C) = P(A)P(B)P(C).

Note 3. Independence among multiple events implies pairwise independence. The converse is false. Claim  1.  If A ⊥⊥ B , then A ⊥⊥ Bc .

Proof.  To prove, A ⊥⊥ Bc , observe that A ∩ Bc  = A − B . One can draw a Venn diagram to verify this. We recall the identity from (1.1) that

P(A ∩ Bc) = P(A − B) = P(A) − P(A ∩ B).

Using A ⊥⊥ B, we have

P(A ∩ Bc) = P(A) − P(A ∩ B) = P(A) − P(A)P(B) = P(A)(1 − P(B)) = P(A)P(Bc),

Example 2. Suppose we throw a pair of 6-sided fair dice such that all outcomes are equally likely. Find independence relations among the following events:

A: the 1st throw was a‘3’, B: the sum of throws equals 7, C: the sum of throws equals 5.

Solution: Observe that the sample space is Ω = {(a,b) : 1 ≤ a ≤ 6, 1 ≤ b ≤ 6)} with 36 outcomes. Next, we have

A = {(3,b) : 1 ≤ b ≤ 6}  =⇒ P(A) =

B = {(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)}  =⇒ P(B) =  C = {(1, 4), (4, 1), (2, 3), (3, 2)}  =⇒ P(C) =

B = {(3, 4)}  =P(A B) =

A ∩ C = {(3, 2)}  =⇒ P(A ∩ C) = 

B ∩ C = ∅  =⇒ P(B ∩ C) = 0

By checking the independence condition, we get A ⊥⊥ B , A ̸⊥⊥ C , B ̸⊥⊥ C .

1.4    Conditional Probability, Bayes Rule and its applications

Definition 5. The probability of an event A occuring given that event B has occured is denoted as P(A | B).

P(A B)

P(B)

Note  4.  The above expression is well-defined only if P(B)  >  0. However we can always write P(A ∩ B) = P(A | B)P(B).

Remark  1.  P(· | B) : F [0, 1] is a probability measure defined for all events A ∈ F, and satisfies all the axioms.

If {Ai} are mutually exclusive and collectively exhaustive, i.e., Ai  are disjoint and ∪iAi  = Ω .

Then,

P(Ai  | B) =

P(B | Ai)P(Ai)

P(Ai)

P(B | Ai)P(Ai)   

=

jP(B | Aj)P(Aj)

for all B ∈ F.

(Bayes Rule)

The denominator is P(B).  The summation results from the fact that B ∩ Aj   are disjoint and B = ∪j(B ∩ Aj). This leads to the identity P(B) =对jP(B ∩ Aj) =对jP(B | Aj)P(Aj)

1.4.1    Example:  Did you say what I just heard?

The Morse code has 2 letters in its alphabet {• , — }. Consider transmitting a message encoded in the Morse code over a noisy channel, with the following channel characteristics:

P(  •  received | transmitted) = P(  •  R | T) =

P(— received | • transmitted) = P(— R | • T) =  1 

A • was received, what is the probability that a • was transmitted? Assume‘•’,‘—’are transmitted with equal probabilty, i.e. P(  • T) = P(— T) =  .

Solution: We wish to calculate P(  • T | • R). To that end we will apply Bayes rule.

P(  T ∩ • R)

P(  • R)

P(  R | • T)P(  T)

=

P(  • R)

P(  R | • T)P(  T)                   

=

P(  • R | • T)P(  • T) + P(  • R | — T)P(— T)

If a‘•’were transmitted, either‘—’or‘•’is being recieved, i.e. P(— R | • T) + P(  • R | • T) = 1, which yields P(• R | • T) =  . substituting this value back, we get,

P(  T | • R) =  =  .

1.4.2    Monty hall problem

The following excerpt appears on wikipedia.org/Monty Hall problem.

“Suppose you’re on a game show, and you’re given the choice of three doors:  Behind one door is a car; behind the others, goats. You pick a door, say No.  1, and the host, who knows what’s behind the doors, opens another door, say No. 2, which has a goat. The host then says to you, “Do you want to pick door No.  2 instead?”  Is it to your advantage to switch your choice?”

Solution: As a contestant who wants to win the car, we want to calculate the probability that the car is behind door 1 given that host revealed to us that there is a goat behind door 2.

Note  5.  The host knows what lies behind each door and cannot reveal a door with the car behind it. Indeed if the car was behind door 3, the host would have no choice but to reveal door 2, since we picked 1. However if the car is behind door 1, the host can randomly reveal one of doors 2 or 3.

Let C = i denote the event that the car is behind door i ∈ {1, 2, 3} . Clearly P(C = i) =  . To calculate the probability of our first choice, we need P(C = 1 | Host revealed door 2), which can be calculated as

P(Host revealed door 2 | C = 1) P(C = 1)

P(Host revealed door 2)

Applying Bayes rule we get that

P(Host revealed door 2 | C = 1)P(C = 1)    P(C = 1 | Host revealed door 2) =

Next, following the note above, we have P(Host revealed door 2  | C = 1) = , and obviously, P(Host revealed door 2 | C = 2) = 0, and finally P(Host revealed door 2 | C = 3) = 1.  Putting this together, we get,

P(Host revealed door 2 | C = 1)               1/2            1

i(3)=1 P(Host revealed door 2 | C = i)     1/2 + 0 + 1     3 .

Similarly, we get,

P(C = 3 | Host revealed door 2) =                                                            =                    =

Therefore we have twice the chance to win the car if we switch (choose door 3 over door 1, our original pick) after the host revealed the goat was behind door 2.