Unit 4: Discrete Probability Distributions

Introduction
4.1 Discrete Uniform Distribution
4.2 Bernoulli Distribution
4.3 Binomial Distribution
4.4 Poisson Distribution
4.5 Geometric Distribution
4.6 Negative Binomial Distribution
4.7 Hypergeometric Distribution
4.8 Summary Table & Relationships

Introduction

This unit covers the most common "named" discrete distributions. For each one, you should know its story (when to use it), its p.m.f., its parameters, its Mean (E[X]), its Variance (Var(X)), and its M.G.F.

4.1 Discrete Uniform Distribution

Story: The "simplest" distribution. All outcomes are equally likely. (e.g., rolling a single fair die).
Parameters: n (the number of possible outcomes). Let the outcomes be x₁, x₂, ..., xₙ.
P.M.F.:
p(x) = 1/n, for x = x₁, x₂, ..., xₙ
Example (Fair Die): n=6. Outcomes are {1, 2, 3, 4, 5, 6}.
p(x) = 1/6, for x = 1, 2, 3, 4, 5, 6.
Mean: E[X] = (n+1)/2 (for outcomes 1, 2, ..., n)
Variance: Var(X) = (n²-1)/12 (for outcomes 1, 2, ..., n)

4.2 Bernoulli Distribution

Story: A single trial with exactly two outcomes: "Success" (x=1) or "Failure" (x=0). (e.g., one coin flip).
Parameters: p (the probability of success).
P.M.F.:
p(x) = p^x * (1-p)^1-x, for x = 0, 1

This is a clever way to write it:
If x=1 (Success): p¹ * (1-p)⁰ = p
If x=0 (Failure): p⁰ * (1-p)¹ = 1-p
Mean: E[X] = p
Variance: Var(X) = p * (1-p) = pq (where q = 1-p)
M.G.F.: M(t) = (1-p) + p*e^t = q + pe^t

4.3 Binomial Distribution

Story: The total number of successes (X) in n independent Bernoulli trials, each with the same success probability p.
Assumptions (B.I.N.S.):
- Binary: Each trial is Success/Failure.
- Independent: Trials are independent.
- Number: Fixed number of trials, n.
- Same: Probability of success p is the same for all trials.
Parameters: n (number of trials), p (probability of success). We write X ~ Bin(n, p).
P.M.F.:
p(x) = C(n, x) * p^x * (1-p)^n-x, for x = 0, 1, ..., n

Where C(n, x) = "n choose x" = n! / (x! * (n-x)!)
Mean: E[X] = n * p
Variance: Var(X) = n * p * (1-p) = npq
M.G.F.: M(t) = (q + pe^t)ⁿ
Relationship: A Binomial(n, p) is the sum of n independent Bernoulli(p) random variables.

4.4 Poisson Distribution

Story: The number of events (X) occurring in a fixed interval of time or space, when the events occur at a known average rate λ (lambda) and independently of the time since the last event.
Examples:
- Number of phone calls to a call center in one hour.
- Number of typos on a page in a book.
- Number of radioactive decay events in one second.
Parameters: λ (the average rate of events per interval). We write X ~ Poi(λ).
P.M.F.:
p(x) = (e^-λ * λ^x) / x!, for x = 0, 1, 2, ...
Mean: E[X] = λ
Variance: Var(X) = λ
Property: The mean is equal to the variance. This is a key identifying feature of the Poisson distribution.
M.G.F.: M(t) = e^{λ(e^t - 1)}
Property (Additivity): If X ~ Poi(λ₁) and Y ~ Poi(λ₂) are independent, then (X+Y) ~ Poi(λ₁ + λ₂).

Poisson Approximation to Binomial:

The Poisson distribution can be used as an approximation for the Binomial(n, p) distribution when:

n is very large (e.g., n > 100)
p is very small (e.g., p < 0.01)

In this case, we set λ = n * p. This is used because the Binomial C(n,x) formula becomes computationally difficult with large n.

4.5 Geometric Distribution

Story: The number of Bernoulli trials (X) needed to get the first success.
Example: Keep flipping a coin until you get the first Head. X is the number of flips.
Parameters: p (probability of success on any given trial).
P.M.F.:
p(x) = (1-p)^x-1 * p, for x = 1, 2, 3, ...

(This means you have x-1 failures, followed by 1 success).
Mean: E[X] = 1 / p
Variance: Var(X) = (1-p) / p² = q / p²
M.G.F.: M(t) = (p * e^t) / (1 - (1-p)e^t)
Property (Memorylessness): The Geometric distribution is "memoryless." P(X > a+b | X > a) = P(X > b). This means if you've already waited 'a' trials without success, the probability of waiting an additional 'b' trials is the same as if you just started.

(Note: Some textbooks define X as the number of *failures* before the first success. The syllabus's companion, Negative Binomial, suggests this "number of trials" definition is the one to use.)

4.6 Negative Binomial Distribution

Story: A generalization of the Geometric. It is the number of Bernoulli trials (X) needed to achieve a fixed number of successes, r.
Example: Keep flipping a coin until you get 3 Heads (r=3). X is the total number of flips.
Parameters: r (number of successes to achieve), p (probability of success).
P.M.F.:
p(x) = C(x-1, r-1) * p^r * (1-p)^x-r, for x = r, r+1, ...

Logic: For the r-th success to be on the x-th trial, two things must happen:
1. The first (x-1) trials must contain exactly (r-1) successes. (This is C(x-1, r-1)).
2. The x-th trial must be a success (This is 'p').
Mean: E[X] = r / p
Variance: Var(X) = r * (1-p) / p² = rq / p²
Relationship: The Geometric distribution is just a Negative Binomial with r=1.

4.7 Hypergeometric Distribution

Story: This is the "Binomial without replacement." It's the number of successes (X) you get in a sample of size n, drawn without replacement from a finite population of size N that contains K successes.
Example: An urn contains 50 balls (N=50), 20 of which are red (K=20). You draw 10 balls (n=10) *without replacement*. X is the number of red balls you drew.
Parameters: N (total population size), K (total number of successes in population), n (sample size).
P.M.F.:
p(x) = [ C(K, x) * C(N-K, n-x) ] / C(N, n)

Logic: (Ways to choose x successes from K) * (Ways to choose n-x failures from N-K) / (Total ways to choose n items from N).
Mean: E[X] = n * (K / N) = n * p (where p = K/N is the initial proportion of successes).
The mean is the same as the Binomial mean!
Variance: Var(X) = n * (K/N) * (1 - K/N) * [ (N-n) / (N-1) ]

The term (N-n)/(N-1) is the Finite Population Correction (FPC) factor. As N → ∞, the FPC → 1, and the variance becomes the Binomial variance. This is why Hypergeometric → Binomial as the population size gets large.

4.8 Summary Table & Relationships

Distribution	Parameters	P.M.F. p(x)	Mean E[X]	Variance Var(X)
Bernoulli	p	p^x(1-p)¹⁻ˣ	p	p(1-p)
Binomial	n, p	C(n, x) p^x(1-p)ⁿ⁻ˣ	np	np(1-p)
Poisson	λ	(e^-λ λ^x) / x!	λ	λ
Geometric	p	(1-p)^x-1 p	1/p	(1-p)/p²
Negative Binomial	r, p	C(x-1, r-1) p^r(1-p)ˣ⁻ʳ	r/p	r(1-p)/p²
Hypergeometric	N, K, n	[C(K,x)C(N-K,n-x)]/C(N,n)	n(K/N)	n(K/N)(1-K/N)[(N-n)/(N-1)]