Unit 3: Generating Functions and Conditional Moments

Table of Contents

3.1 Moments and Cumulants

Moments

Moments are expectations of powers of a random variable. They describe the shape of a distribution (central tendency, dispersion, skewness, kurtosis).

Relationship: μ₂ = E[X²] - (E[X])² = μ₂' - (μ₁')²

Cumulants

Cumulants, denoted κr (kappa-r), are another set of descriptive constants. Their key property is additivity for independent variables: if X and Y are independent, the r-th cumulant of (X+Y) is κr(X) + κr(Y).

Cumulants are generated by the Cumulant Generating Function (C.G.F.).

3.2 Moment Generating Function (M.G.F.)

The Moment Generating Function (M.G.F.) is a function MX(t) that, if it exists, can be used to easily generate all the raw moments of a distribution.

Definition

The M.G.F. is defined only for values of 't' for which this expectation exists (i.e., the sum/integral converges).

How it Generates Moments

The r-th raw moment (μr') is the r-th derivative of the M.G.F. with respect to 't', evaluated at t=0.

μr' = E[Xr] = (dr / dtr) MX(t) |t=0

Why does this work? If you expand etX using a Taylor series:

etX = 1 + tX + (t²X²)/2! + (t³X³)/3! + ...

E[etX] = E[1 + tX + (t²X²)/2! + ...] = 1 + t*E[X] + (t²/2!)*E[X²] + ...

This is a power series in 't'. If you differentiate once w.r.t 't' and set t=0, only the coefficient of 't' (which is E[X]) remains. Differentiate twice, you get E[X²].

Properties of M.G.F.

  1. Effect of Linear Transformation (aX + b):
  2. MaX+b(t) = E[et(aX+b)] = E[eatX * ebt] = ebt * E[e(at)X] = ebt * MX(at)
  3. Sum of Independent Variables (X + Y): If X and Y are independent:
  4. MX+Y(t) = E[et(X+Y)] = E[etX * etY] = E[etX] * E[etY] = MX(t) * MY(t)

    The M.G.F. of a sum of independent variables is the product of their individual M.G.F.s.

3.3 Cumulant Generating Function (C.G.F.)

The Cumulant Generating Function (C.G.F.) is simply the natural logarithm of the M.G.F. It is used to generate cumulants.

KX(t) = log( MX(t) )

How it Generates Cumulants

The r-th cumulant (κr) is the r-th derivative of the C.G.F. with respect to 't', evaluated at t=0.

κr = (dr / dtr) KX(t) |t=0

Property of C.G.F.

If X and Y are independent, KX+Y(t) = log(MX+Y(t)) = log(MX(t) * MY(t)) = log(MX(t)) + log(MY(t)) = KX(t) + KY(t).

The C.G.F. of a sum of independent variables is the sum of their individual C.G.F.s. This is the additivity property of cumulants.

3.4 Characteristic Function

The M.G.F. has a major drawback: it doesn't exist for all distributions (e.g., the Cauchy distribution). The Characteristic Function (C.F.) solves this problem. It *always* exists for *every* distribution.

It is defined using the complex number i = sqrt(-1).

φX(t) = E[eitX] = E[cos(tX) + i * sin(tX)]

The C.F. has the same properties as the M.G.F. (e.g., for sums of independent variables, C.F.s multiply). It can also be used to find moments: E[Xr] = (1/ir) * φX(r)(0).

3.5 Uniqueness and Inversion Theorems (Without Proof)

These theorems are the reason generating functions are so important. The syllabus states they are without proof, so you only need to understand their application.

Uniqueness Theorem

Theorem: If two random variables X and Y have M.G.F.s MX(t) and MY(t) that are equal (i.e., MX(t) = MY(t) for all t in an open interval around 0), then X and Y have the same probability distribution.

The same theorem holds for Characteristic Functions, and it's more powerful because the C.F. always exists.

Application (How to use this in an exam):
  1. You are asked to find the distribution of Y = X₁ + X₂.
  2. You know X₁ and X₂ are independent and from a known distribution (e.g., Normal or Poisson).
  3. Find the M.G.F.s MX₁(t) and MX₂(t).
  4. Find the M.G.F. of Y: MY(t) = MX₁(t) * MX₂(t).
  5. Simplify the resulting function, MY(t).
  6. Recognize this new M.G.F. as the M.G.F. of another known distribution.
  7. By the Uniqueness Theorem, Y *must* follow that distribution.

Example: If X ~ Normal(μ₁, σ₁²) and Y ~ Normal(μ₂, σ₂²) are independent, their MGFs multiply. The resulting MGF is recognizable as a Normal MGF with mean (μ₁+μ₂) and variance (σ₁²+σ₂²). Therefore, (X+Y) is also normally distributed.

Inversion Theorem

Theorem: Given a Characteristic Function φX(t), there is a unique corresponding p.d.f./p.m.f. f(x).

This means the C.F. contains all the information about the distribution. The "inversion formula" (which you don't need) is a way to retrieve the p.d.f. from the C.F. using an integral. The key takeaway is that the C.F. uniquely defines the distribution.

3.6 Conditional Expectation and Variance

Conditional Expectation

Conditional Expectation, E[Y | X=x], is the expected value (mean) of Y, calculated using the conditional distribution f(y|x). It represents our best guess for Y, *given that we know* X has taken the value x.

E[Y | X=x] = ∫-∞+∞ y * f(y | x) dy

Important: E[Y | X=x] is a function of x. If we let 'x' be random again, we get E[Y | X], which is a random variable (its value depends on X).

Law of Total Expectation (or Tower Property):

This law connects conditional expectation back to the marginal expectation. It states that the overall average (E[Y]) is the average of the conditional averages (E[Y|X]).

E[Y] = EX[ E[Y | X] ]

In practice: You find E[Y|X=x], which is a function of x. Then you take the expectation of that function with respect to the distribution of X.
E[Y] = ∫ (E[Y | X=x]) * fX(x) dx.

Conditional Variance

Conditional Variance, Var(Y | X=x), is the variance of Y, calculated using the conditional distribution f(y|x). It measures the "spread" or "uncertainty" of Y, *given that we know* X=x.

Var(Y | X=x) = E[ (Y - E[Y|x])² | X=x ]

Computational Formula:

Var(Y | X=x) = E[Y² | X=x] - ( E[Y | X=x] )²

Like conditional expectation, Var(Y | X) is a random variable (a function of X).

Law of Total Variance (or EVE's Law):

This law decomposes the total variance of Y into two parts: the "explained" variance and the "unexplained" variance.

Var(Y) = E[ Var(Y | X) ] + Var( E[Y | X] )