Moments are expectations of powers of a random variable. They describe the shape of a distribution (central tendency, dispersion, skewness, kurtosis).
Relationship: μ₂ = E[X²] - (E[X])² = μ₂' - (μ₁')²
Cumulants, denoted κr (kappa-r), are another set of descriptive constants. Their key property is additivity for independent variables: if X and Y are independent, the r-th cumulant of (X+Y) is κr(X) + κr(Y).
Cumulants are generated by the Cumulant Generating Function (C.G.F.).
The Moment Generating Function (M.G.F.) is a function MX(t) that, if it exists, can be used to easily generate all the raw moments of a distribution.
The M.G.F. is defined only for values of 't' for which this expectation exists (i.e., the sum/integral converges).
The r-th raw moment (μr') is the r-th derivative of the M.G.F. with respect to 't', evaluated at t=0.
Why does this work? If you expand etX using a Taylor series:
etX = 1 + tX + (t²X²)/2! + (t³X³)/3! + ...
E[etX] = E[1 + tX + (t²X²)/2! + ...] = 1 + t*E[X] + (t²/2!)*E[X²] + ...
This is a power series in 't'. If you differentiate once w.r.t 't' and set t=0, only the coefficient of 't' (which is E[X]) remains. Differentiate twice, you get E[X²].
The M.G.F. of a sum of independent variables is the product of their individual M.G.F.s.
The Cumulant Generating Function (C.G.F.) is simply the natural logarithm of the M.G.F. It is used to generate cumulants.
The r-th cumulant (κr) is the r-th derivative of the C.G.F. with respect to 't', evaluated at t=0.
If X and Y are independent, KX+Y(t) = log(MX+Y(t)) = log(MX(t) * MY(t)) = log(MX(t)) + log(MY(t)) = KX(t) + KY(t).
The C.G.F. of a sum of independent variables is the sum of their individual C.G.F.s. This is the additivity property of cumulants.
The M.G.F. has a major drawback: it doesn't exist for all distributions (e.g., the Cauchy distribution). The Characteristic Function (C.F.) solves this problem. It *always* exists for *every* distribution.
It is defined using the complex number i = sqrt(-1).
The C.F. has the same properties as the M.G.F. (e.g., for sums of independent variables, C.F.s multiply). It can also be used to find moments: E[Xr] = (1/ir) * φX(r)(0).
These theorems are the reason generating functions are so important. The syllabus states they are without proof, so you only need to understand their application.
Theorem: If two random variables X and Y have M.G.F.s MX(t) and MY(t) that are equal (i.e., MX(t) = MY(t) for all t in an open interval around 0), then X and Y have the same probability distribution.
The same theorem holds for Characteristic Functions, and it's more powerful because the C.F. always exists.
Example: If X ~ Normal(μ₁, σ₁²) and Y ~ Normal(μ₂, σ₂²) are independent, their MGFs multiply. The resulting MGF is recognizable as a Normal MGF with mean (μ₁+μ₂) and variance (σ₁²+σ₂²). Therefore, (X+Y) is also normally distributed.
Theorem: Given a Characteristic Function φX(t), there is a unique corresponding p.d.f./p.m.f. f(x).
This means the C.F. contains all the information about the distribution. The "inversion formula" (which you don't need) is a way to retrieve the p.d.f. from the C.F. using an integral. The key takeaway is that the C.F. uniquely defines the distribution.
Conditional Expectation, E[Y | X=x], is the expected value (mean) of Y, calculated using the conditional distribution f(y|x). It represents our best guess for Y, *given that we know* X has taken the value x.
Important: E[Y | X=x] is a function of x. If we let 'x' be random again, we get E[Y | X], which is a random variable (its value depends on X).
This law connects conditional expectation back to the marginal expectation. It states that the overall average (E[Y]) is the average of the conditional averages (E[Y|X]).
In practice: You find E[Y|X=x], which is a function of x. Then you take the expectation of that function with respect to the distribution of X.
E[Y] = ∫ (E[Y | X=x]) * fX(x) dx.
Conditional Variance, Var(Y | X=x), is the variance of Y, calculated using the conditional distribution f(y|x). It measures the "spread" or "uncertainty" of Y, *given that we know* X=x.
Computational Formula:
Like conditional expectation, Var(Y | X) is a random variable (a function of X).
This law decomposes the total variance of Y into two parts: the "explained" variance and the "unexplained" variance.