Unit 5: Probability Theorems
Table of Contents
This unit builds on the axioms from Unit 4 to derive the fundamental rules for calculating complex probabilities.
5.1 Conditional Probability
Definition
The conditional probability of event A occurring, given that event B has already occurred, is denoted P(A | B).
It "restricts" the sample space. We are now only interested in the outcomes where B happened. Out of those, what is the probability that A *also* happened?
- Example: Rolling a fair die. S = {1, 2, 3, 4, 5, 6}.
- Let A = "Get a number > 3" = {4, 5, 6}. P(A) = 3/6.
- Let B = "Get an even number" = {2, 4, 6}. P(B) = 3/6.
- A ∩ B = {4, 6}. P(A ∩ B) = 2/6.
- Question: What is the probability of getting a number > 3, GIVEN that it was an even number? (Find P(A | B)).
- Method 1 (Formula): P(A | B) = P(A ∩ B) / P(B) = (2/6) / (3/6) = 2/3.
- Method 2 (Logic): We know the outcome is B = {2, 4, 6}. This is our new sample space (size 3). Within this set, which outcomes are also in A? {4, 6} (size 2). Therefore, the probability is 2/3.
5.2 Multiplication Theorem of Probability
This is just a rearrangement of the conditional probability formula. It's used to find the probability of an intersection (A AND B).
General Rule:P(A ∩ B) = P(A) * P(B | A)...or...
P(A ∩ B) = P(B) * P(A | B)
In words: "The probability of A and B both happening is the probability of A happening, *times* the probability of B happening *given that A has already happened*."
- Example: Draw 2 cards from a deck *without replacement*. What is P(King and then Queen)?
- A = 1st card is King. P(A) = 4/52.
- B = 2nd card is Queen.
- We need P(A ∩ B) = P(A) * P(B | A).
- P(B | A) = "Prob. of 2nd being Queen, GIVEN 1st was a King." After taking one King, there are 51 cards left, 4 of which are Queens. So, P(B | A) = 4/51.
- P(A ∩ B) = (4/52) * (4/51) = 16/2652.
5.3 Addition Theorem of Probability
This rule is used to find the probability of a union (A OR B).
General Rule (for ANY two events):P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
In words: "We add P(A) and P(B). But if we do, we've double-counted the intersection (A ∩ B), so we must subtract it once."
If A and B are mutually exclusive, then A ∩ B = Ø, so P(A ∩ B) = 0.
The formula simplifies to: P(A ∪ B) = P(A) + P(B). (This is Axiom 3 from Unit 4).
5.4 Theorem of Total Probability
This theorem is used to find the "total" probability of an event (A) when we don't know P(A) directly, but we *do* know its conditional probability based on a set of other events.
Let B₁, B₂, ..., Bₙ be a set of events that are mutually exclusive and exhaustive (they form a "partition" of the entire sample space S).
Theorem:P(A) = P(A | B₁)P(B₁) + P(A | B₂)P(B₂) + ... + P(A | Bₙ)P(Bₙ)...which can be written using the multiplication rule as...
P(A) = P(A ∩ B₁) + P(A ∩ B₂) + ... + P(A ∩ Bₙ)
- Example:
- Urn 1 (B₁) has 2 Red, 3 White balls. P(B₁) = 1/2.
- Urn 2 (B₂) has 4 Red, 1 White ball. P(B₂) = 1/2.
- You pick an urn at random and then pick one ball. What is the total probability of getting a Red ball (A)?
- P(A | B₁) = Prob. Red GIVEN Urn 1 = 2/5.
- P(A | B₂) = Prob. Red GIVEN Urn 2 = 4/5.
- P(A) = P(A | B₁)P(B₁) + P(A | B₂)P(B₂)
- P(A) = (2/5)(1/2) + (4/5)(1/2) = 1/5 + 2/5 = 3/5.
5.5 Independent Events
Definition
Two events A and B are statistically independent if the occurrence of one event does not affect the probability of the other event occurring.
Formal Definition: A and B are independent if and only if:P(A ∩ B) = P(A) * P(B)
This leads to the Multiplication Rule for Independent Events:
If A and B are independent, P(A | B) = P(A) and P(B | A) = P(B).
- Mutually Exclusive: P(A ∩ B) = 0. (If A happens, B *cannot* happen). They are highly dependent.
- Independent: P(A ∩ B) = P(A) * P(B). (If A happens, it doesn't change B's probability). They *can* happen together (unless P(A) or P(B) is 0).
Pairwise vs. Mutual Independence
For three events (A, B, C):
- Pairwise Independent: They are independent in pairs.
- P(A ∩ B) = P(A)P(B)
- P(A ∩ C) = P(A)P(C)
- P(B ∩ C) = P(B)P(C)
- Mutually Independent: They are pairwise independent, AND...
- P(A ∩ B ∩ C) = P(A)P(B)P(C)
5.6 Bayes' Theorem and Applications
Bayes' Theorem is one of the most important theorems in probability. It is used to update a probability based on new evidence.
It "flips" a conditional probability. We often know P(Evidence | Hypothesis), but we *want* to know P(Hypothesis | Evidence).
- We know P(Symptom | Disease).
- We want to know P(Disease | Symptom).
The Theorem
Let B₁, B₂, ..., Bₙ be a partition of the sample space (e.g., "Disease 1", "Disease 2", "No Disease"). Let A be some new evidence (e.g., "Positive Test Result").
Bayes' Theorem finds the probability of a specific hypothesis (say, Bₖ) given the evidence (A):
P(Bₖ | A) = [ P(A | Bₖ) * P(Bₖ) ] / P(A)Using the Theorem of Total Probability for P(A), this is written as:
P(Bₖ | A) = [ P(A | Bₖ)P(Bₖ) ] / [ Σ P(A | Bᵢ)P(Bᵢ) ]
Terminology
- P(Bₖ): Prior Probability. Our belief in the hypothesis *before* seeing the new evidence. (e.g., the general prevalence of the disease in a population).
- P(A | Bₖ): Likelihood. The probability of seeing the evidence, *given* that our hypothesis is true. (e.g., the "true positive rate" or "sensitivity" of the test).
- P(A): Marginal Likelihood / Total Probability. The total probability of seeing the evidence, averaged over all possible hypotheses.
- P(Bₖ | A): Posterior Probability. Our *updated* belief in the hypothesis *after* seeing the evidence. This is the answer we are looking for.
Example
A factory has two machines. Machine 1 (B₁) makes 60% of products (P(B₁)=0.6). Machine 2 (B₂) makes 40% (P(B₂)=0.4).
Machine 1 has a 5% defect rate (P(A|B₁)=0.05).
Machine 2 has a 10% defect rate (P(A|B₂)=0.10).
Question: You find a defective product (A). What is the probability it came from Machine 1? (Find P(B₁ | A)).
- Find P(A) (Total Probability of a Defect):
P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂)
P(A) = (0.05)(0.60) + (0.10)(0.40)
P(A) = 0.03 + 0.04 = 0.07 (7% of all products are defective) - Use Bayes' Formula:
P(B₁ | A) = [ P(A | B₁)P(B₁) ] / P(A)
P(B₁ | A) = (0.05 * 0.60) / 0.07
P(B₁ | A) = 0.03 / 0.07 = 3/7 ≈ 0.428
Interpretation: Even though Machine 1 makes more products, a defective item is more likely to be from Machine 2 (P(B₂|A) = 4/7). Our prior belief (60%) that it was from Machine 1 has been updated *downward* (to 42.8%) based on the new evidence that the product was defective.