Unit 1: Random Variables and Distributions

1.1 Univariate Random Variables
1.2 PMF, PDF, and CDF
1.3 Univariate Transformations
1.4 Two-Dimensional (Bivariate) Random Variables
1.5 Joint, Marginal, and Conditional Distributions
1.6 Independence of Variables
1.7 Bivariate Transformations

1.1 Univariate Random Variables

Definition of a Random Variable

A random variable (often denoted as X, Y, etc.) is a function that assigns a unique numerical value to each possible outcome in the sample space (S) of a random experiment.

It's a "variable" because it can take different values, and "random" because the specific value it takes is determined by the outcome of a random phenomenon.

Example: Consider tossing two fair coins.

The sample space is S = {HH, HT, TH, TT}.

Let the random variable X be "the number of heads."

X is a function that maps outcomes to numbers:

X(HH) = 2

X(HT) = 1

X(TH) = 1

X(TT) = 0

The possible values for X are {0, 1, 2}.

Discrete vs. Continuous Random Variables

Random variables are broadly classified into two types:

Type	Definition	Values are...	Examples
Discrete	A random variable that can take on a finite or countably infinite number of distinct values.	Counted. There are "gaps" between possible values.	Number of heads in 3 coin flips (0, 1, 2, 3) Number of cars passing a toll booth in an hour (0, 1, 2, ...) Number of defective items in a batch (0, 1, ..., n)
Continuous	A random variable that can take on any value within a given range or interval.	Measured. There are no gaps between values.	Height of a student (e.g., any value between 150cm and 190cm) Temperature of a room Time until a light bulb burns out

1.2 PMF, PDF, and CDF

We describe the probability of different values of a random variable using a distribution function.

Probability Mass Function (p.m.f.)

Used for: Discrete Random Variables.
Definition: A function p(x) that gives the probability that the discrete random variable X is exactly equal to some value x.

p(x) = P(X = x)

Properties:
1. p(x) ≥ 0 for all x (Probabilities can't be negative).
2. Σ p(x) = 1 (The sum of probabilities for all possible outcomes must be 1).
Example (Two Coin Toss): For X = number of heads, the p.m.f. is:
- p(0) = P(X=0) = P(TT) = 1/4
- p(1) = P(X=1) = P(HT or TH) = 2/4 = 1/2
- p(2) = P(X=2) = P(HH) = 1/4
- Check: (1/4) + (1/2) + (1/4) = 1.

Probability Density Function (p.d.f.)

Used for: Continuous Random Variables.
Definition: A function f(x) where the area under the curve between two points (a, b) gives the probability that X falls within that interval.

P(a ≤ X ≤ b) = ∫_a^b f(x) dx

Properties:
1. f(x) ≥ 0 for all x (The density curve cannot dip below the x-axis).
2. ∫_-∞^+∞ f(x) dx = 1 (The total area under the entire curve must be 1).

Common Mistake: For a continuous random variable, the probability of any single, exact value is zero.

P(X = a) = ∫_a^a f(x) dx = 0.

This is because there is no "area" under a single point. Probabilities are only defined over intervals. This also means P(a ≤ X ≤ b) is the same as P(a < X < b).

Cumulative Distribution Function (c.d.f.)

Used for: Both Discrete and Continuous Random Variables.
Definition: A function F(x) that gives the probability that the random variable X is less than or equal to a specific value x.

F(x) = P(X ≤ x)

For Discrete X: F(x) = Σ_t≤x p(t)
- The C.D.F. is a "step function" that jumps up at each possible value of X.
For Continuous X: F(x) = ∫_-∞^x f(t) dt
- The C.D.F. is a continuous, non-decreasing function.
- Relationship to p.d.f.: You can get the p.d.f. by differentiating the c.d.f.: f(x) = d/dx F(x).
Universal Properties of C.D.F.:
1. 0 ≤ F(x) ≤ 1 (It is a probability).
2. F(x) is non-decreasing (i.e., if a < b, then F(a) ≤ F(b)).
3. lim_x→-∞ F(x) = 0 (Probability of X ≤ -∞ is 0).
4. lim_x→+∞ F(x) = 1 (Probability of X ≤ +∞ is 1).

1.3 Univariate Transformations

Often, we are interested in a function of a random variable. If we know the distribution of X, can we find the distribution of Y = g(X)?

Discrete Case

This is straightforward. The p.m.f. of Y is found by summing the probabilities of all x values that map to a given y value.

p_Y(y) = P(Y=y) = P(g(X) = y) = Σ_{{x | g(x)=y}} p_X(x)

Example: Let X have p.m.f. p(-1)=0.1, p(0)=0.3, p(1)=0.4, p(2)=0.2.
Find the p.m.f. of Y = X².

Y can take values X² = {(-1)², 0², 1², 2²} = {1, 0, 4}.

p_Y(0) = P(Y=0) = P(X²=0) = P(X=0) = 0.3

p_Y(1) = P(Y=1) = P(X²=1) = P(X=-1 or X=1) = p_X(-1) + p_X(1) = 0.1 + 0.4 = 0.5

p_Y(4) = P(Y=4) = P(X²=4) = P(X=2) = 0.2

The p.m.f. for Y is: p_Y(0)=0.3, p_Y(1)=0.5, p_Y(4)=0.2. (Check: 0.3+0.5+0.2 = 1).

Continuous Case (Change of Variable Technique)

This is more complex and requires calculus. If Y = g(X) is a monotonic (strictly increasing or decreasing) function, we can find the p.d.f. of Y.

Find the inverse function: x = g⁻¹(y).
Find the derivative of the inverse function: dx/dy.
The p.d.f. of Y is given by the formula:

f_Y(y) = f_X(g⁻¹(y)) * |dx/dy|

The |dx/dy| term is called the Jacobian of the transformation. It scales the density function to ensure the total area remains 1.

Example: Let X be a continuous RV with p.d.f. f_X(x) = 2x, for 0 < x < 1.
Find the p.d.f. of Y = 8X³.

Find range of Y: If 0 < x < 1, then 0 < 8x³ < 8. So, 0 < y < 8.

Find inverse: y = 8x³ => x³ = y/8 => x = (y/8)¹/³ = y¹/³ / 2. So, g⁻¹(y) = y¹/³ / 2.

Find derivative: dx/dy = d/dy ( (1/2) * y¹/³ ) = (1/2) * (1/3) * y⁻²/³ = 1 / (6y²/³).

Apply formula:

f_Y(y) = f_X(g⁻¹(y)) * |dx/dy|

f_Y(y) = 2(g⁻¹(y)) * |1 / (6y²/³)|

f_Y(y) = 2(y¹/³ / 2) * (1 / (6y²/³)) (Since y > 0, absolute value is not needed)

f_Y(y) = (y¹/³) * (1 / (6y²/³)) = y¹/³⁻²/³ / 6 = y⁻¹/³ / 6

Final PDF: f_Y(y) = 1 / (6y¹/³), for 0 < y < 8.

1.4 Two-Dimensional (Bivariate) Random Variables

We often need to study two or more random variables simultaneously. A bivariate random variable is an ordered pair (X, Y) that maps each outcome in a sample space S to a point in the 2D plane.

(Discrete, Discrete): (Number of heads, Number of tails).
(Continuous, Continuous): (Height, Weight).
(Discrete, Continuous): (Number of children in a family, Annual income).

1.5 Joint, Marginal, and Conditional Distributions

Joint p.m.f. and p.d.f.

This is the 2D equivalent of a p.m.f./p.d.f. It describes the probability of X and Y *simultaneously* taking on certain values.

Joint p.m.f. (Discrete): p(x, y) = P(X=x, Y=y)
- Properties: 1. p(x,y) ≥ 0, 2. Σ_x Σ_y p(x,y) = 1
Joint p.d.f. (Continuous): f(x, y)
- Properties: 1. f(x,y) ≥ 0, 2. ∫_-∞^+∞ ∫_-∞^+∞ f(x,y) dx dy = 1
- Probability is volume: P(ac^d ∫_a^b f(x,y) dx dy.

Joint c.d.f.

F(x, y) = P(X ≤ x, Y ≤ y)

Discrete: F(x, y) = Σ_s≤x Σ_t≤y p(s, t)
Continuous: F(x, y) = ∫_-∞^y ∫_-∞^x f(s, t) ds dt
We can get the joint p.d.f. from the c.d.f.: f(x,y) = ∂²F(x,y) / (∂x ∂y).

Marginal Distributions

The marginal distribution of X is the individual probability distribution of X, "ignoring" Y. We get it by "summing out" or "integrating out" the other variable from the joint distribution.

Marginal p.m.f. for X (Discrete):
p_X(x) = P(X=x) = Σ_y p(x, y)

Think of this as summing across the rows in a joint probability table.
Marginal p.d.f. for X (Continuous):
f_X(x) = ∫_-∞^+∞ f(x, y) dy

This gives the individual p.d.f. for X. The same logic applies for finding the marginal distribution of Y (sum/integrate over x).

Conditional Distributions

The conditional distribution describes the probability of one variable *given that we know* the value of the other. It's like taking a "slice" of the joint distribution.

Conditional p.m.f. of Y given X=x:
p(y | x) = P(Y=y | X=x) = P(X=x, Y=y) / P(X=x) = p(x, y) / p_X(x)
Conditional p.d.f. of Y given X=x:
f(y | x) = f(x, y) / f_X(x)

Key Relationship: The joint distribution is the product of the marginal and the conditional.

f(x, y) = f(y | x) * f_X(x) and f(x, y) = f(x | y) * f_Y(y)

This is just a rearrangement of the conditional formula and is extremely useful in proofs.

1.6 Independence of Variables

Definition: Two random variables X and Y are independent if and only if their joint distribution function factors into the product of their individual marginal distribution functions.

For all (x, y):
- Joint c.d.f.: F(x, y) = F_X(x) * F_Y(y)
- Joint p.m.f./p.d.f.: f(x, y) = f_X(x) * f_Y(y)

If X and Y are independent, then the conditional distribution is equal to the marginal distribution:

f(y | x) = f(x, y) / f_X(x) = (f_X(x) * f_Y(y)) / f_X(x) = f_Y(y)

This makes intuitive sense: if the variables are independent, knowing the value of X gives you no new information about Y.

1.7 Bivariate Transformations

This extends the univariate case. We have (X, Y) and want to find the joint p.d.f. of new variables, U and V, where:

U = g₁(X, Y) and V = g₂(X, Y)

The Jacobian Method (Change of Variables)

Define the transformations: U = g₁(X, Y) and V = g₂(X, Y).
Solve for the inverse functions: X = h₁(U, V) and Y = h₂(U, V).
Calculate the Jacobian determinant (J) of the inverse transformation. This is the determinant of a matrix of partial derivatives:

J = det [ (∂x/∂u) (∂x/∂v) (∂y/∂u) (∂y/∂v) ] = (∂x/∂u)(∂y/∂v) - (∂x/∂v)(∂y/∂u)

The new joint p.d.f. for U and V is:

f_U,V(u, v) = f_X,Y(h₁(u, v), h₂(u, v)) * |J|

You must also transform the domain (the range of possible x, y values) into the new domain for u, v.

Classic Exam Problem: Let X and Y be independent Exponential(λ) variables. Find the joint distribution of U = X+Y and V = X/(X+Y).

You will find that U and V are independent, where U is a Gamma variable and V is a Beta variable. This is a very common and important transformation.

Knowlet