Unit 5: Probability Theorems

Table of Contents

This unit builds on the axioms from Unit 4 to derive the fundamental rules for calculating complex probabilities.

5.1 Conditional Probability

Definition

The conditional probability of event A occurring, given that event B has already occurred, is denoted P(A | B).

It "restricts" the sample space. We are now only interested in the outcomes where B happened. Out of those, what is the probability that A *also* happened?

P(A | B) = P(A ∩ B) / P(B), provided P(B) > 0

5.2 Multiplication Theorem of Probability

This is just a rearrangement of the conditional probability formula. It's used to find the probability of an intersection (A AND B).

General Rule:
P(A ∩ B) = P(A) * P(B | A)

...or...

P(A ∩ B) = P(B) * P(A | B)

In words: "The probability of A and B both happening is the probability of A happening, *times* the probability of B happening *given that A has already happened*."

5.3 Addition Theorem of Probability

This rule is used to find the probability of a union (A OR B).

General Rule (for ANY two events):
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

In words: "We add P(A) and P(B). But if we do, we've double-counted the intersection (A ∩ B), so we must subtract it once."

Special Case (Mutually Exclusive Events):

If A and B are mutually exclusive, then A ∩ B = Ø, so P(A ∩ B) = 0.

The formula simplifies to: P(A ∪ B) = P(A) + P(B). (This is Axiom 3 from Unit 4).

5.4 Theorem of Total Probability

This theorem is used to find the "total" probability of an event (A) when we don't know P(A) directly, but we *do* know its conditional probability based on a set of other events.

Let B₁, B₂, ..., Bₙ be a set of events that are mutually exclusive and exhaustive (they form a "partition" of the entire sample space S).

Theorem:
P(A) = P(A | B₁)P(B₁) + P(A | B₂)P(B₂) + ... + P(A | Bₙ)P(Bₙ)

...which can be written using the multiplication rule as...

P(A) = P(A ∩ B₁) + P(A ∩ B₂) + ... + P(A ∩ Bₙ)

5.5 Independent Events

Definition

Two events A and B are statistically independent if the occurrence of one event does not affect the probability of the other event occurring.

Formal Definition: A and B are independent if and only if:
P(A ∩ B) = P(A) * P(B)

This leads to the Multiplication Rule for Independent Events:

If A and B are independent, P(A | B) = P(A) and P(B | A) = P(B).

Independent vs. Mutually Exclusive - This is a very common point of confusion.

Pairwise vs. Mutual Independence

For three events (A, B, C):

Mutual independence is a stronger condition than pairwise independence. You can have events that are pairwise independent but not mutually independent.

5.6 Bayes' Theorem and Applications

Bayes' Theorem is one of the most important theorems in probability. It is used to update a probability based on new evidence.

It "flips" a conditional probability. We often know P(Evidence | Hypothesis), but we *want* to know P(Hypothesis | Evidence).

The Theorem

Let B₁, B₂, ..., Bₙ be a partition of the sample space (e.g., "Disease 1", "Disease 2", "No Disease"). Let A be some new evidence (e.g., "Positive Test Result").

Bayes' Theorem finds the probability of a specific hypothesis (say, Bₖ) given the evidence (A):

P(Bₖ | A) = [ P(A | Bₖ) * P(Bₖ) ] / P(A)

Using the Theorem of Total Probability for P(A), this is written as:

P(Bₖ | A) = [ P(A | Bₖ)P(Bₖ) ] / [ Σ P(A | Bᵢ)P(Bᵢ) ]

Terminology

Posterior = (Likelihood * Prior) / Evidence

Example

A factory has two machines. Machine 1 (B₁) makes 60% of products (P(B₁)=0.6). Machine 2 (B₂) makes 40% (P(B₂)=0.4).
Machine 1 has a 5% defect rate (P(A|B₁)=0.05).
Machine 2 has a 10% defect rate (P(A|B₂)=0.10).
Question: You find a defective product (A). What is the probability it came from Machine 1? (Find P(B₁ | A)).

  1. Find P(A) (Total Probability of a Defect):
    P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂)
    P(A) = (0.05)(0.60) + (0.10)(0.40)
    P(A) = 0.03 + 0.04 = 0.07 (7% of all products are defective)
  2. Use Bayes' Formula:
    P(B₁ | A) = [ P(A | B₁)P(B₁) ] / P(A)
    P(B₁ | A) = (0.05 * 0.60) / 0.07
    P(B₁ | A) = 0.03 / 0.07 = 3/7 ≈ 0.428

Interpretation: Even though Machine 1 makes more products, a defective item is more likely to be from Machine 2 (P(B₂|A) = 4/7). Our prior belief (60%) that it was from Machine 1 has been updated *downward* (to 42.8%) based on the new evidence that the product was defective.