Knowlet

Unit 3: Generating Functions and Conditional Moments

3.1 Moments and Cumulants

Moments

Moments are expectations of powers of a random variable. They describe the shape of a distribution (central tendency, dispersion, skewness, kurtosis).

  • Raw Moments (Moments about the origin): Denoted μr' (mu-r-prime).
  • μr' = E[Xr]
    • μ1' = E[X¹] = Mean (μ)
    • μ2' = E[X²] (Used to calculate variance)
  • Central Moments (Moments about the mean): Denoted μr.
  • μr = E[ (X - μ)r ]
    • μ1 = E[X - μ] = E[X] - E[μ] = μ - μ = 0 (always)
    • μ2 = E[(X - μ)²] = Variance (σ²)
    • μ3 = E[(X - μ)³] (Used to measure skewness - asymmetry)
    • μ4 = E[(X - μ)⁴] (Used to measure kurtosis - "tailedness" or "peakedness")

Relationship: μ₂ = E[X²] - (E[X])² = μ₂' - (μ₁')²

Cumulants

Cumulants, denoted κr (kappa-r), are another set of descriptive constants. Their key property is additivity for independent variables: if X and Y are independent, the r-th cumulant of (X+Y) is κr(X) + κr(Y).

  • κ₁ = μ₁' = Mean
  • κ₂ = μ₂ = Variance (σ²)
  • κ₃ = μ₃ (Skewness measure)
  • κ₄ = μ₄ - 3(μ₂)² (Kurtosis measure)

Cumulants are generated by the Cumulant Generating Function (C.G.F.).

3.2 Moment Generating Function (M.G.F.)

The Moment Generating Function (M.G.F.) is a function MX(t) that, if it exists, can be used to easily generate all the raw moments of a distribution.

Definition

  • For a Discrete RV (X):
  • MX(t) = E[etX] = Σx etx * p(x)
  • For a Continuous RV (X):
  • MX(t) = E[etX] = ∫-∞+∞ etx * f(x) dx

The M.G.F. is defined only for values of 't' for which this expectation exists (i.e., the sum/integral converges).

How it Generates Moments

The r-th raw moment (μr') is the r-th derivative of the M.G.F. with respect to 't', evaluated at t=0.

μr' = E[Xr] = (dr / dtr) MX(t) |t=0
  • E[X] = MX'(0)
  • E[X²] = MX''(0)
  • ...and so on.

Why does this work? If you expand etX using a Taylor series:

etX = 1 + tX + (t²X²)/2! + (t³X³)/3! + ...

E[etX] = E[1 + tX + (t²X²)/2! + ...] = 1 + t*E[X] + (t²/2!)*E[X²] + ...

This is a power series in 't'. If you differentiate once w.r.t 't' and set t=0, only the coefficient of 't' (which is E[X]) remains. Differentiate twice, you get E[X²].

Properties of M.G.F.

  1. Effect of Linear Transformation (aX + b):
  2. MaX+b(t) = E[et(aX+b)] = E[eatX * ebt] = ebt * E[e(at)X] = ebt * MX(at)
  3. Sum of Independent Variables (X + Y): If X and Y are independent:
  4. MX+Y(t) = E[et(X+Y)] = E[etX * etY] = E[etX] * E[etY] = MX(t) * MY(t)

    The M.G.F. of a sum of independent variables is the product of their individual M.G.F.s.

3.3 Cumulant Generating Function (C.G.F.)

The Cumulant Generating Function (C.G.F.) is simply the natural logarithm of the M.G.F. It is used to generate cumulants.

KX(t) = log( MX(t) )

How it Generates Cumulants

The r-th cumulant (κr) is the r-th derivative of the C.G.F. with respect to 't', evaluated at t=0.

κr = (dr / dtr) KX(t) |t=0
  • κ₁ = KX'(0) = Mean
  • κ₂ = KX''(0) = Variance

Property of C.G.F.

If X and Y are independent, KX+Y(t) = log(MX+Y(t)) = log(MX(t) * MY(t)) = log(MX(t)) + log(MY(t)) = KX(t) + KY(t).

The C.G.F. of a sum of independent variables is the sum of their individual C.G.F.s. This is the additivity property of cumulants.

3.4 Characteristic Function

The M.G.F. has a major drawback: it doesn't exist for all distributions (e.g., the Cauchy distribution). The Characteristic Function (C.F.) solves this problem. It *always* exists for *every* distribution.

It is defined using the complex number i = sqrt(-1).

φX(t) = E[eitX] = E[cos(tX) + i * sin(tX)]
  • Discrete: φX(t) = Σx eitx * p(x)
  • Continuous: φX(t) = ∫-∞+∞ eitx * f(x) dx

The C.F. has the same properties as the M.G.F. (e.g., for sums of independent variables, C.F.s multiply). It can also be used to find moments: E[Xr] = (1/ir) * φX(r)(0).

3.5 Uniqueness and Inversion Theorems (Without Proof)

These theorems are the reason generating functions are so important. The syllabus states they are without proof, so you only need to understand their application.

Uniqueness Theorem

Theorem: If two random variables X and Y have M.G.F.s MX(t) and MY(t) that are equal (i.e., MX(t) = MY(t) for all t in an open interval around 0), then X and Y have the same probability distribution.

The same theorem holds for Characteristic Functions, and it's more powerful because the C.F. always exists.

Application (How to use this in an exam):
  1. You are asked to find the distribution of Y = X₁ + X₂.
  2. You know X₁ and X₂ are independent and from a known distribution (e.g., Normal or Poisson).
  3. Find the M.G.F.s MX₁(t) and MX₂(t).
  4. Find the M.G.F. of Y: MY(t) = MX₁(t) * MX₂(t).
  5. Simplify the resulting function, MY(t).
  6. Recognize this new M.G.F. as the M.G.F. of another known distribution.
  7. By the Uniqueness Theorem, Y *must* follow that distribution.

Example: If X ~ Normal(μ₁, σ₁²) and Y ~ Normal(μ₂, σ₂²) are independent, their MGFs multiply. The resulting MGF is recognizable as a Normal MGF with mean (μ₁+μ₂) and variance (σ₁²+σ₂²). Therefore, (X+Y) is also normally distributed.

Inversion Theorem

Theorem: Given a Characteristic Function φX(t), there is a unique corresponding p.d.f./p.m.f. f(x).

This means the C.F. contains all the information about the distribution. The "inversion formula" (which you don't need) is a way to retrieve the p.d.f. from the C.F. using an integral. The key takeaway is that the C.F. uniquely defines the distribution.

3.6 Conditional Expectation and Variance

Conditional Expectation

Conditional Expectation, E[Y | X=x], is the expected value (mean) of Y, calculated using the conditional distribution f(y|x). It represents our best guess for Y, *given that we know* X has taken the value x.

E[Y | X=x] = ∫-∞+∞ y * f(y | x) dy

Important: E[Y | X=x] is a function of x. If we let 'x' be random again, we get E[Y | X], which is a random variable (its value depends on X).

Law of Total Expectation (or Tower Property):

This law connects conditional expectation back to the marginal expectation. It states that the overall average (E[Y]) is the average of the conditional averages (E[Y|X]).

E[Y] = EX[ E[Y | X] ]

In practice: You find E[Y|X=x], which is a function of x. Then you take the expectation of that function with respect to the distribution of X.
E[Y] = ∫ (E[Y | X=x]) * fX(x) dx.

Conditional Variance

Conditional Variance, Var(Y | X=x), is the variance of Y, calculated using the conditional distribution f(y|x). It measures the "spread" or "uncertainty" of Y, *given that we know* X=x.

Var(Y | X=x) = E[ (Y - E[Y|x])² | X=x ]

Computational Formula:

Var(Y | X=x) = E[Y² | X=x] - ( E[Y | X=x] )²

Like conditional expectation, Var(Y | X) is a random variable (a function of X).

Law of Total Variance (or EVE's Law):

This law decomposes the total variance of Y into two parts: the "explained" variance and the "unexplained" variance.

Var(Y) = E[ Var(Y | X) ] + Var( E[Y | X] )
  • E[ Var(Y | X) ]: The "Mean of the Conditional Variances." This is the part of the variance that remains *even after* we know X (the average "unexplained" variance).
  • Var( E[Y | X] ): The "Variance of the Conditional Means." This is the part of the variance that is *explained* by X (how much the conditional mean of Y changes as X changes).

Did this resource help you study?

Share feedback or report issues to help improve this resource.