Unit 5: Probability Theorems
This unit builds on the axioms from Unit 4 to derive the fundamental rules for calculating complex probabilities.
5.1 Conditional Probability
Definition
The conditional probability of event A occurring, given that event B has already occurred, is denoted P(A | B).
It "restricts" the sample space. We are now only interested in the outcomes where B happened. Out of those, what is the probability that A *also* happened?
P(A | B) = P(A ∩ B) / P(B), provided P(B) > 0
- Example: Rolling a fair die. S = {1, 2, 3, 4, 5, 6}.
- Let A = "Get a number > 3" = {4, 5, 6}. P(A) = 3/6.
- Let B = "Get an even number" = {2, 4, 6}. P(B) = 3/6.
- A ∩ B = {4, 6}. P(A ∩ B) = 2/6.
- Question: What is the probability of getting a number > 3, GIVEN that it was an even number? (Find P(A | B)).
- Method 1 (Formula): P(A | B) = P(A ∩ B) / P(B) = (2/6) / (3/6) = 2/3.
- Method 2 (Logic): We know the outcome is B = {2, 4, 6}. This is our new sample space (size 3). Within this set, which outcomes are also in A? {4, 6} (size 2). Therefore, the probability is 2/3.
5.2 Multiplication Theorem of Probability
This is just a rearrangement of the conditional probability formula. It's used to find the probability of an intersection (A AND B).
General Rule:
P(A ∩ B) = P(A) * P(B | A)
...or...
P(A ∩ B) = P(B) * P(A | B)
In words: "The probability of A and B both happening is the probability of A happening, *times* the probability of B happening *given that A has already happened*."
- Example: Draw 2 cards from a deck *without replacement*. What is P(King and then Queen)?
- A = 1st card is King. P(A) = 4/52.
- B = 2nd card is Queen.
- We need P(A ∩ B) = P(A) * P(B | A).
- P(B | A) = "Prob. of 2nd being Queen, GIVEN 1st was a King." After taking one King, there are 51 cards left, 4 of which are Queens. So, P(B | A) = 4/51.
- P(A ∩ B) = (4/52) * (4/51) = 16/2652.
5.3 Addition Theorem of Probability
This rule is used to find the probability of a union (A OR B).
General Rule (for ANY two events):
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
In words: "We add P(A) and P(B). But if we do, we've double-counted the intersection (A ∩ B), so we must subtract it once."
Special Case (Mutually Exclusive Events):
If A and B are mutually exclusive, then A ∩ B = Ø, so P(A ∩ B) = 0.
The formula simplifies to: P(A ∪ B) = P(A) + P(B). (This is Axiom 3 from Unit 4).
5.4 Theorem of Total Probability
This theorem is used to find the "total" probability of an event (A) when we don't know P(A) directly, but we *do* know its conditional probability based on a set of other events.
Let B₁, B₂, ..., Bₙ be a set of events that are mutually exclusive and exhaustive (they form a "partition" of the entire sample space S).
Theorem:
P(A) = P(A | B₁)P(B₁) + P(A | B₂)P(B₂) + ... + P(A | Bₙ)P(Bₙ)
...which can be written using the multiplication rule as...
P(A) = P(A ∩ B₁) + P(A ∩ B₂) + ... + P(A ∩ Bₙ)
- Example:
- Urn 1 (B₁) has 2 Red, 3 White balls. P(B₁) = 1/2.
- Urn 2 (B₂) has 4 Red, 1 White ball. P(B₂) = 1/2.
- You pick an urn at random and then pick one ball. What is the total probability of getting a Red ball (A)?
- P(A | B₁) = Prob. Red GIVEN Urn 1 = 2/5.
- P(A | B₂) = Prob. Red GIVEN Urn 2 = 4/5.
- P(A) = P(A | B₁)P(B₁) + P(A | B₂)P(B₂)
- P(A) = (2/5)(1/2) + (4/5)(1/2) = 1/5 + 2/5 = 3/5.
5.5 Independent Events
Definition
Two events A and B are statistically independent if the occurrence of one event does not affect the probability of the other event occurring.
Formal Definition: A and B are independent if and only if:
P(A ∩ B) = P(A) * P(B)
This leads to the Multiplication Rule for Independent Events:
If A and B are independent, P(A | B) = P(A) and P(B | A) = P(B).
Independent vs. Mutually Exclusive - This is a very common point of confusion.
- Mutually Exclusive: P(A ∩ B) = 0. (If A happens, B *cannot* happen). They are highly dependent.
- Independent: P(A ∩ B) = P(A) * P(B). (If A happens, it doesn't change B's probability). They *can* happen together (unless P(A) or P(B) is 0).
Pairwise vs. Mutual Independence
For three events (A, B, C):
- Pairwise Independent: They are independent in pairs.
- P(A ∩ B) = P(A)P(B)
- P(A ∩ C) = P(A)P(C)
- P(B ∩ C) = P(B)P(C)
- Mutually Independent: They are pairwise independent, AND...
- P(A ∩ B ∩ C) = P(A)P(B)P(C)
Mutual independence is a stronger condition than pairwise independence. You can have events that are pairwise independent but not mutually independent.
5.6 Bayes' Theorem and Applications
Bayes' Theorem is one of the most important theorems in probability. It is used to update a probability based on new evidence.
It "flips" a conditional probability. We often know P(Evidence | Hypothesis), but we *want* to know P(Hypothesis | Evidence).
- We know P(Symptom | Disease).
- We want to know P(Disease | Symptom).
The Theorem
Let B₁, B₂, ..., Bₙ be a partition of the sample space (e.g., "Disease 1", "Disease 2", "No Disease"). Let A be some new evidence (e.g., "Positive Test Result").
Bayes' Theorem finds the probability of a specific hypothesis (say, Bₖ) given the evidence (A):
P(Bₖ | A) = [ P(A | Bₖ) * P(Bₖ) ] / P(A)
Using the Theorem of Total Probability for P(A), this is written as:
P(Bₖ | A) = [ P(A | Bₖ)P(Bₖ) ] / [ Σ P(A | Bᵢ)P(Bᵢ) ]
Terminology
- P(Bₖ): Prior Probability. Our belief in the hypothesis *before* seeing the new evidence. (e.g., the general prevalence of the disease in a population).
- P(A | Bₖ): Likelihood. The probability of seeing the evidence, *given* that our hypothesis is true. (e.g., the "true positive rate" or "sensitivity" of the test).
- P(A): Marginal Likelihood / Total Probability. The total probability of seeing the evidence, averaged over all possible hypotheses.
- P(Bₖ | A): Posterior Probability. Our *updated* belief in the hypothesis *after* seeing the evidence. This is the answer we are looking for.
Posterior = (Likelihood * Prior) / Evidence
Example
A factory has two machines. Machine 1 (B₁) makes 60% of products (P(B₁)=0.6). Machine 2 (B₂) makes 40% (P(B₂)=0.4).
Machine 1 has a 5% defect rate (P(A|B₁)=0.05).
Machine 2 has a 10% defect rate (P(A|B₂)=0.10).
Question: You find a defective product (A). What is the probability it came from Machine 1? (Find P(B₁ | A)).
- Find P(A) (Total Probability of a Defect):
P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂)
P(A) = (0.05)(0.60) + (0.10)(0.40)
P(A) = 0.03 + 0.04 = 0.07 (7% of all products are defective)
- Use Bayes' Formula:
P(B₁ | A) = [ P(A | B₁)P(B₁) ] / P(A)
P(B₁ | A) = (0.05 * 0.60) / 0.07
P(B₁ | A) = 0.03 / 0.07 = 3/7 ≈ 0.428
Interpretation: Even though Machine 1 makes more products, a defective item is more likely to be from Machine 2 (P(B₂|A) = 4/7). Our prior belief (60%) that it was from Machine 1 has been updated *downward* (to 42.8%) based on the new evidence that the product was defective.