Unit 4: Discrete Probability Distributions
Introduction
This unit covers the most common "named" discrete distributions. For each one, you should know its story (when to use it), its p.m.f., its parameters, its Mean (E[X]), its Variance (Var(X)), and its M.G.F.
- Story: The "simplest" distribution. All outcomes are equally likely. (e.g., rolling a single fair die).
- Parameters:
n (the number of possible outcomes). Let the outcomes be x₁, x₂, ..., xₙ.
- P.M.F.:
p(x) = 1/n, for x = x₁, x₂, ..., xₙ
- Example (Fair Die): n=6. Outcomes are {1, 2, 3, 4, 5, 6}.
p(x) = 1/6, for x = 1, 2, 3, 4, 5, 6.
- Mean: E[X] = (n+1)/2 (for outcomes 1, 2, ..., n)
- Variance: Var(X) = (n²-1)/12 (for outcomes 1, 2, ..., n)
4.2 Bernoulli Distribution
- Story: A single trial with exactly two outcomes: "Success" (x=1) or "Failure" (x=0). (e.g., one coin flip).
- Parameters:
p (the probability of success).
- P.M.F.:
p(x) = px * (1-p)1-x, for x = 0, 1
This is a clever way to write it:
If x=1 (Success): p¹ * (1-p)⁰ = p
If x=0 (Failure): p⁰ * (1-p)¹ = 1-p
- Mean: E[X] = p
- Variance: Var(X) = p * (1-p) = pq (where q = 1-p)
- M.G.F.: M(t) = (1-p) + p*et = q + pet
4.3 Binomial Distribution
- Story: The total number of successes (X) in
n independent Bernoulli trials, each with the same success probability p.
- Assumptions (B.I.N.S.):
- Binary: Each trial is Success/Failure.
- Independent: Trials are independent.
- Number: Fixed number of trials,
n.
- Same: Probability of success
p is the same for all trials.
- Parameters:
n (number of trials), p (probability of success). We write X ~ Bin(n, p).
- P.M.F.:
p(x) = C(n, x) * px * (1-p)n-x, for x = 0, 1, ..., n
Where C(n, x) = "n choose x" = n! / (x! * (n-x)!)
- Mean: E[X] = n * p
- Variance: Var(X) = n * p * (1-p) = npq
- M.G.F.: M(t) = (q + pet)n
- Relationship: A Binomial(n, p) is the sum of n independent Bernoulli(p) random variables.
4.4 Poisson Distribution
- Story: The number of events (X) occurring in a fixed interval of time or space, when the events occur at a known average rate
λ (lambda) and independently of the time since the last event.
- Examples:
- Number of phone calls to a call center in one hour.
- Number of typos on a page in a book.
- Number of radioactive decay events in one second.
- Parameters:
λ (the average rate of events per interval). We write X ~ Poi(λ).
- P.M.F.:
p(x) = (e-λ * λx) / x!, for x = 0, 1, 2, ...
- Mean: E[X] = λ
- Variance: Var(X) = λ
- Property: The mean is equal to the variance. This is a key identifying feature of the Poisson distribution.
- M.G.F.: M(t) = eλ(et - 1)
- Property (Additivity): If X ~ Poi(λ₁) and Y ~ Poi(λ₂) are independent, then (X+Y) ~ Poi(λ₁ + λ₂).
Poisson Approximation to Binomial:
The Poisson distribution can be used as an approximation for the Binomial(n, p) distribution when:
n is very large (e.g., n > 100)
p is very small (e.g., p < 0.01)
In this case, we set λ = n * p. This is used because the Binomial C(n,x) formula becomes computationally difficult with large n.
4.5 Geometric Distribution
- Story: The number of Bernoulli trials (X) needed to get the first success.
- Example: Keep flipping a coin until you get the first Head. X is the number of flips.
- Parameters:
p (probability of success on any given trial).
- P.M.F.:
p(x) = (1-p)x-1 * p, for x = 1, 2, 3, ...
(This means you have x-1 failures, followed by 1 success).
- Mean: E[X] = 1 / p
- Variance: Var(X) = (1-p) / p² = q / p²
- M.G.F.: M(t) = (p * et) / (1 - (1-p)et)
- Property (Memorylessness): The Geometric distribution is "memoryless." P(X > a+b | X > a) = P(X > b). This means if you've already waited 'a' trials without success, the probability of waiting an additional 'b' trials is the same as if you just started.
(Note: Some textbooks define X as the number of *failures* before the first success. The syllabus's companion, Negative Binomial, suggests this "number of trials" definition is the one to use.)
4.6 Negative Binomial Distribution
- Story: A generalization of the Geometric. It is the number of Bernoulli trials (X) needed to achieve a fixed number of successes,
r.
- Example: Keep flipping a coin until you get 3 Heads (r=3). X is the total number of flips.
- Parameters:
r (number of successes to achieve), p (probability of success).
- P.M.F.:
p(x) = C(x-1, r-1) * pr * (1-p)x-r, for x = r, r+1, ...
Logic: For the r-th success to be on the x-th trial, two things must happen:
1. The first (x-1) trials must contain exactly (r-1) successes. (This is C(x-1, r-1)).
2. The x-th trial must be a success (This is 'p').
- Mean: E[X] = r / p
- Variance: Var(X) = r * (1-p) / p² = rq / p²
- Relationship: The Geometric distribution is just a Negative Binomial with r=1.
4.7 Hypergeometric Distribution
- Story: This is the "Binomial without replacement." It's the number of successes (X) you get in a sample of size
n, drawn without replacement from a finite population of size N that contains K successes.
- Example: An urn contains 50 balls (N=50), 20 of which are red (K=20). You draw 10 balls (n=10) *without replacement*. X is the number of red balls you drew.
- Parameters:
N (total population size), K (total number of successes in population), n (sample size).
- P.M.F.:
p(x) = [ C(K, x) * C(N-K, n-x) ] / C(N, n)
Logic: (Ways to choose x successes from K) * (Ways to choose n-x failures from N-K) / (Total ways to choose n items from N).
- Mean: E[X] = n * (K / N) = n * p (where p = K/N is the initial proportion of successes).
- The mean is the same as the Binomial mean!
- Variance: Var(X) = n * (K/N) * (1 - K/N) * [ (N-n) / (N-1) ]
The term (N-n)/(N-1) is the Finite Population Correction (FPC) factor. As N → ∞, the FPC → 1, and the variance becomes the Binomial variance. This is why Hypergeometric → Binomial as the population size gets large.
4.8 Summary Table & Relationships