Unit 3: Generating Functions and Conditional Moments

3.1 Moments and Cumulants
3.2 Moment Generating Function (M.G.F.)
3.3 Cumulant Generating Function (C.G.F.)
3.4 Characteristic Function
3.5 Uniqueness and Inversion Theorems
3.6 Conditional Expectation and Variance

3.1 Moments and Cumulants

Moments

Moments are expectations of powers of a random variable. They describe the shape of a distribution (central tendency, dispersion, skewness, kurtosis).

Raw Moments (Moments about the origin): Denoted μ_r' (mu-r-prime).

μ_r' = E[X^r]

μ₁' = E[X¹] = Mean (μ)
μ₂' = E[X²] (Used to calculate variance)

Central Moments (Moments about the mean): Denoted μ_r.

μ_r = E[ (X - μ)^r ]

μ₁ = E[X - μ] = E[X] - E[μ] = μ - μ = 0 (always)
μ₂ = E[(X - μ)²] = Variance (σ²)
μ₃ = E[(X - μ)³] (Used to measure skewness - asymmetry)
μ₄ = E[(X - μ)⁴] (Used to measure kurtosis - "tailedness" or "peakedness")

Relationship: μ₂ = E[X²] - (E[X])² = μ₂' - (μ₁')²

Cumulants

Cumulants, denoted κ_r (kappa-r), are another set of descriptive constants. Their key property is additivity for independent variables: if X and Y are independent, the r-th cumulant of (X+Y) is κ_r(X) + κ_r(Y).

κ₁ = μ₁' = Mean
κ₂ = μ₂ = Variance (σ²)
κ₃ = μ₃ (Skewness measure)
κ₄ = μ₄ - 3(μ₂)² (Kurtosis measure)

Cumulants are generated by the Cumulant Generating Function (C.G.F.).

3.2 Moment Generating Function (M.G.F.)

The Moment Generating Function (M.G.F.) is a function M_X(t) that, if it exists, can be used to easily generate all the raw moments of a distribution.

Definition

For a Discrete RV (X):

M_X(t) = E[e^tX] = Σ_x e^tx * p(x)

For a Continuous RV (X):

M_X(t) = E[e^tX] = ∫_-∞^+∞ e^tx * f(x) dx

The M.G.F. is defined only for values of 't' for which this expectation exists (i.e., the sum/integral converges).

How it Generates Moments

The r-th raw moment (μ_r') is the r-th derivative of the M.G.F. with respect to 't', evaluated at t=0.

μ_r' = E[X^r] = (d^r / dt^r) M_X(t) |_t=0

E[X] = M_X'(0)
E[X²] = M_X''(0)
...and so on.

Why does this work? If you expand e^tX using a Taylor series:

e^tX = 1 + tX + (t²X²)/2! + (t³X³)/3! + ...

E[e^tX] = E[1 + tX + (t²X²)/2! + ...] = 1 + t*E[X] + (t²/2!)*E[X²] + ...

This is a power series in 't'. If you differentiate once w.r.t 't' and set t=0, only the coefficient of 't' (which is E[X]) remains. Differentiate twice, you get E[X²].

Properties of M.G.F.

Effect of Linear Transformation (aX + b):

M_aX+b(t) = E[e^t(aX+b)] = E[e^atX * e^bt] = e^bt * E[e^(at)X] = e^bt * M_X(at)

Sum of Independent Variables (X + Y): If X and Y are independent:

M_X+Y(t) = E[e^t(X+Y)] = E[e^tX * e^tY] = E[e^tX] * E[e^tY] = M_X(t) * M_Y(t)

The M.G.F. of a sum of independent variables is the product of their individual M.G.F.s.

3.3 Cumulant Generating Function (C.G.F.)

The Cumulant Generating Function (C.G.F.) is simply the natural logarithm of the M.G.F. It is used to generate cumulants.

K_X(t) = log( M_X(t) )

How it Generates Cumulants

The r-th cumulant (κ_r) is the r-th derivative of the C.G.F. with respect to 't', evaluated at t=0.

κ_r = (d^r / dt^r) K_X(t) |_t=0

κ₁ = K_X'(0) = Mean
κ₂ = K_X''(0) = Variance

Property of C.G.F.

If X and Y are independent, K_X+Y(t) = log(M_X+Y(t)) = log(M_X(t) * M_Y(t)) = log(M_X(t)) + log(M_Y(t)) = K_X(t) + K_Y(t).

The C.G.F. of a sum of independent variables is the sum of their individual C.G.F.s. This is the additivity property of cumulants.

3.4 Characteristic Function

The M.G.F. has a major drawback: it doesn't exist for all distributions (e.g., the Cauchy distribution). The Characteristic Function (C.F.) solves this problem. It *always* exists for *every* distribution.

It is defined using the complex number i = sqrt(-1).

φ_X(t) = E[e^itX] = E[cos(tX) + i * sin(tX)]

Discrete: φ_X(t) = Σ_x e^itx * p(x)
Continuous: φ_X(t) = ∫_-∞^+∞ e^itx * f(x) dx

The C.F. has the same properties as the M.G.F. (e.g., for sums of independent variables, C.F.s multiply). It can also be used to find moments: E[X^r] = (1/i^r) * φ_X^(r)(0).

3.5 Uniqueness and Inversion Theorems (Without Proof)

These theorems are the reason generating functions are so important. The syllabus states they are without proof, so you only need to understand their application.

Uniqueness Theorem

Theorem: If two random variables X and Y have M.G.F.s M_X(t) and M_Y(t) that are equal (i.e., M_X(t) = M_Y(t) for all t in an open interval around 0), then X and Y have the same probability distribution.

The same theorem holds for Characteristic Functions, and it's more powerful because the C.F. always exists.

Application (How to use this in an exam):

You are asked to find the distribution of Y = X₁ + X₂.
You know X₁ and X₂ are independent and from a known distribution (e.g., Normal or Poisson).
Find the M.G.F.s M_X₁(t) and M_X₂(t).
Find the M.G.F. of Y: M_Y(t) = M_X₁(t) * M_X₂(t).
Simplify the resulting function, M_Y(t).
Recognize this new M.G.F. as the M.G.F. of another known distribution.
By the Uniqueness Theorem, Y *must* follow that distribution.

Example: If X ~ Normal(μ₁, σ₁²) and Y ~ Normal(μ₂, σ₂²) are independent, their MGFs multiply. The resulting MGF is recognizable as a Normal MGF with mean (μ₁+μ₂) and variance (σ₁²+σ₂²). Therefore, (X+Y) is also normally distributed.

Inversion Theorem

Theorem: Given a Characteristic Function φ_X(t), there is a unique corresponding p.d.f./p.m.f. f(x).

This means the C.F. contains all the information about the distribution. The "inversion formula" (which you don't need) is a way to retrieve the p.d.f. from the C.F. using an integral. The key takeaway is that the C.F. uniquely defines the distribution.

3.6 Conditional Expectation and Variance

Conditional Expectation

Conditional Expectation, E[Y | X=x], is the expected value (mean) of Y, calculated using the conditional distribution f(y|x). It represents our best guess for Y, *given that we know* X has taken the value x.

E[Y | X=x] = ∫_-∞^+∞ y * f(y | x) dy

Important: E[Y | X=x] is a function of x. If we let 'x' be random again, we get E[Y | X], which is a random variable (its value depends on X).

Law of Total Expectation (or Tower Property):

This law connects conditional expectation back to the marginal expectation. It states that the overall average (E[Y]) is the average of the conditional averages (E[Y|X]).

E[Y] = E_X[ E[Y | X] ]

In practice: You find E[Y|X=x], which is a function of x. Then you take the expectation of that function with respect to the distribution of X.
E[Y] = ∫ (E[Y | X=x]) * f_X(x) dx.

Conditional Variance

Conditional Variance, Var(Y | X=x), is the variance of Y, calculated using the conditional distribution f(y|x). It measures the "spread" or "uncertainty" of Y, *given that we know* X=x.

Var(Y | X=x) = E[ (Y - E[Y|x])² | X=x ]

Computational Formula:

Var(Y | X=x) = E[Y² | X=x] - ( E[Y | X=x] )²

Like conditional expectation, Var(Y | X) is a random variable (a function of X).

Law of Total Variance (or EVE's Law):

This law decomposes the total variance of Y into two parts: the "explained" variance and the "unexplained" variance.

Var(Y) = E[ Var(Y | X) ] + Var( E[Y | X] )

E[ Var(Y | X) ]: The "Mean of the Conditional Variances." This is the part of the variance that remains *even after* we know X (the average "unexplained" variance).
Var( E[Y | X] ): The "Variance of the Conditional Means." This is the part of the variance that is *explained* by X (how much the conditional mean of Y changes as X changes).