Unit 2: Mathematical Expectation

Table of Contents

2.1 Mathematical Expectation of a Random Variable

Definition of Expectation (E[X])

The mathematical expectation (or expected value, mean) of a random variable X is the weighted average of all possible values that X can take, with the weights being their respective probabilities.

It represents the long-run average value of X if the experiment were repeated many times. It is often denoted as μ (mu).

Expectation of a Function g(X)

We often need the expectation of a function of X, say Y = g(X). We can find E[Y] *without* first finding the p.d.f. of Y, using the Law of the Unconscious Statistician (LOTUS).

Example: Let X be a discrete RV with p.m.f. p(0)=0.5, p(1)=0.3, p(2)=0.2.
Find E[X] and E[X²].

2.2 Properties of Expectation (Addition & Multiplication Theorems)

Expectation has several key properties that make it a powerful tool.

  1. E[c] = c

    The expected value of a constant (c) is just the constant itself. (e.g., E[5] = 5).

  2. E[c * X] = c * E[X]

    Constants can be factored out of an expectation.

  3. E[aX + b] = a * E[X] + b (from properties 1 and 2)

    Expectation is a linear operator.

  4. Addition Theorem: E[X + Y] = E[X] + E[Y]

    The expectation of a sum is the sum of the expectations. This is known as the Linearity of Expectation.

    Crucial Exam Point: The Addition Theorem E[X + Y] = E[X] + E[Y] holds whether or not X and Y are independent. This is a very powerful property and a common exam question.
  5. Multiplication Theorem: E[X * Y] = E[X] * E[Y] (if X and Y are independent)

    The expectation of a product is the product of the expectations *only if* the variables are independent.

    Warning: The reverse is not true! If E[XY] = E[X]E[Y], it does not necessarily mean X and Y are independent. It only means they are uncorrelated (a weaker condition, which we'll see next).

2.3 Variance and Covariance

Variance (Var(X) or σ²)

While expectation tells us the "center" of a distribution, variance tells us about its "spread" or "dispersion." A small variance means data points are clustered tightly around the mean. A large variance means they are spread out.

Properties of Variance

  1. Var(X) ≥ 0 (Variance can never be negative, as it's an expectation of a squared value).
  2. Var(c) = 0 (A constant has no spread, so its variance is zero).
  3. Var(aX + b) = a² * Var(X)
    • Adding a constant 'b' shifts the distribution but doesn't change its spread (Var(X+b) = Var(X)).
    • Multiplying by 'a' scales the spread by a² (Var(aX) = a²Var(X)).

Covariance (Cov(X, Y))

Covariance measures the joint variability of two random variables, (X, Y). It describes the direction of the linear relationship between them.

Independence vs. Uncorrelated:

Variance of a Sum (General Case)

Using covariance, we can state the general formula for the variance of a sum:

Var(X + Y) = Var(X) + Var(Y) + 2 * Cov(X, Y)
Var(X - Y) = Var(X) + Var(Y) - 2 * Cov(X, Y)

If X and Y are independent, then Cov(X, Y) = 0, and the formulas simplify:

Var(X + Y) = Var(X) + Var(Y) (if independent)

Var(X - Y) = Var(X) + Var(Y) (if independent)

Warning: A very common mistake is to think Var(X - Y) = Var(X) - Var(Y). This is WRONG. Variance is a measure of spread (a squared quantity), so it always adds.

2.4 Expectation of a Bivariate Random Variable

This is simply an application of the Law of the Unconscious Statistician (LOTUS) for a function of two variables, g(X, Y).

The formulas for E[X+Y], E[XY], and Cov(X,Y) are all special cases of this.

2.5 Solved Examples

Example: Let (X, Y) have the joint p.d.f. f(x, y) = 2 for 0 < x < y < 1, and 0 otherwise.

Find E[X] and E[Y].

1. Find Marginal PDFs first:

2. Calculate Expectations using Marginals:

Alternative (using joint p.d.f.):