Unit 1: Random Variables and Distributions
Table of Contents
1.1 Univariate Random Variables
Definition of a Random Variable
A random variable (often denoted as X, Y, etc.) is a function that assigns a unique numerical value to each possible outcome in the sample space (S) of a random experiment.
It's a "variable" because it can take different values, and "random" because the specific value it takes is determined by the outcome of a random phenomenon.
Example: Consider tossing two fair coins.
- The sample space is S = {HH, HT, TH, TT}.
- Let the random variable X be "the number of heads."
- X is a function that maps outcomes to numbers:
- X(HH) = 2
- X(HT) = 1
- X(TH) = 1
- X(TT) = 0
- The possible values for X are {0, 1, 2}.
Discrete vs. Continuous Random Variables
Random variables are broadly classified into two types:
| Type | Definition | Values are... | Examples |
|---|---|---|---|
| Discrete | A random variable that can take on a finite or countably infinite number of distinct values. | Counted. There are "gaps" between possible values. |
|
| Continuous | A random variable that can take on any value within a given range or interval. | Measured. There are no gaps between values. |
|
1.2 PMF, PDF, and CDF
We describe the probability of different values of a random variable using a distribution function.
Probability Mass Function (p.m.f.)
- Used for: Discrete Random Variables.
- Definition: A function p(x) that gives the probability that the discrete random variable X is exactly equal to some value x.
- Properties:
- p(x) ≥ 0 for all x (Probabilities can't be negative).
- Σ p(x) = 1 (The sum of probabilities for all possible outcomes must be 1).
- Example (Two Coin Toss): For X = number of heads, the p.m.f. is:
- p(0) = P(X=0) = P(TT) = 1/4
- p(1) = P(X=1) = P(HT or TH) = 2/4 = 1/2
- p(2) = P(X=2) = P(HH) = 1/4
- Check: (1/4) + (1/2) + (1/4) = 1.
Probability Density Function (p.d.f.)
- Used for: Continuous Random Variables.
- Definition: A function f(x) where the area under the curve between two points (a, b) gives the probability that X falls within that interval.
- Properties:
- f(x) ≥ 0 for all x (The density curve cannot dip below the x-axis).
- ∫-∞+∞ f(x) dx = 1 (The total area under the entire curve must be 1).
P(X = a) = ∫aa f(x) dx = 0.
This is because there is no "area" under a single point. Probabilities are only defined over intervals. This also means P(a ≤ X ≤ b) is the same as P(a < X < b).
Cumulative Distribution Function (c.d.f.)
- Used for: Both Discrete and Continuous Random Variables.
- Definition: A function F(x) that gives the probability that the random variable X is less than or equal to a specific value x.
- For Discrete X: F(x) = Σt≤x p(t)
- The C.D.F. is a "step function" that jumps up at each possible value of X.
- For Continuous X: F(x) = ∫-∞x f(t) dt
- The C.D.F. is a continuous, non-decreasing function.
- Relationship to p.d.f.: You can get the p.d.f. by differentiating the c.d.f.: f(x) = d/dx F(x).
- Universal Properties of C.D.F.:
- 0 ≤ F(x) ≤ 1 (It is a probability).
- F(x) is non-decreasing (i.e., if a < b, then F(a) ≤ F(b)).
- limx→-∞ F(x) = 0 (Probability of X ≤ -∞ is 0).
- limx→+∞ F(x) = 1 (Probability of X ≤ +∞ is 1).
1.3 Univariate Transformations
Often, we are interested in a function of a random variable. If we know the distribution of X, can we find the distribution of Y = g(X)?
Discrete Case
This is straightforward. The p.m.f. of Y is found by summing the probabilities of all x values that map to a given y value.
pY(y) = P(Y=y) = P(g(X) = y) = Σ{x | g(x)=y} pX(x)
Example: Let X have p.m.f. p(-1)=0.1, p(0)=0.3, p(1)=0.4, p(2)=0.2.
Find the p.m.f. of Y = X².
- Y can take values X² = {(-1)², 0², 1², 2²} = {1, 0, 4}.
- pY(0) = P(Y=0) = P(X²=0) = P(X=0) = 0.3
- pY(1) = P(Y=1) = P(X²=1) = P(X=-1 or X=1) = pX(-1) + pX(1) = 0.1 + 0.4 = 0.5
- pY(4) = P(Y=4) = P(X²=4) = P(X=2) = 0.2
- The p.m.f. for Y is: pY(0)=0.3, pY(1)=0.5, pY(4)=0.2. (Check: 0.3+0.5+0.2 = 1).
Continuous Case (Change of Variable Technique)
This is more complex and requires calculus. If Y = g(X) is a monotonic (strictly increasing or decreasing) function, we can find the p.d.f. of Y.
- Find the inverse function: x = g⁻¹(y).
- Find the derivative of the inverse function: dx/dy.
- The p.d.f. of Y is given by the formula:
The |dx/dy| term is called the Jacobian of the transformation. It scales the density function to ensure the total area remains 1.
Example: Let X be a continuous RV with p.d.f. fX(x) = 2x, for 0 < x < 1.
Find the p.d.f. of Y = 8X³.
- Find range of Y: If 0 < x < 1, then 0 < 8x³ < 8. So, 0 < y < 8.
- Find inverse: y = 8x³ => x³ = y/8 => x = (y/8)¹/³ = y¹/³ / 2. So, g⁻¹(y) = y¹/³ / 2.
- Find derivative: dx/dy = d/dy ( (1/2) * y¹/³ ) = (1/2) * (1/3) * y⁻²/³ = 1 / (6y²/³).
- Apply formula:
- fY(y) = fX(g⁻¹(y)) * |dx/dy|
- fY(y) = 2(g⁻¹(y)) * |1 / (6y²/³)|
- fY(y) = 2(y¹/³ / 2) * (1 / (6y²/³)) (Since y > 0, absolute value is not needed)
- fY(y) = (y¹/³) * (1 / (6y²/³)) = y¹/³⁻²/³ / 6 = y⁻¹/³ / 6
- Final PDF: fY(y) = 1 / (6y¹/³), for 0 < y < 8.
1.4 Two-Dimensional (Bivariate) Random Variables
We often need to study two or more random variables simultaneously. A bivariate random variable is an ordered pair (X, Y) that maps each outcome in a sample space S to a point in the 2D plane.
- (Discrete, Discrete): (Number of heads, Number of tails).
- (Continuous, Continuous): (Height, Weight).
- (Discrete, Continuous): (Number of children in a family, Annual income).
1.5 Joint, Marginal, and Conditional Distributions
Joint p.m.f. and p.d.f.
This is the 2D equivalent of a p.m.f./p.d.f. It describes the probability of X and Y *simultaneously* taking on certain values.
- Joint p.m.f. (Discrete): p(x, y) = P(X=x, Y=y)
- Properties: 1. p(x,y) ≥ 0, 2. Σx Σy p(x,y) = 1
- Joint p.d.f. (Continuous): f(x, y)
- Properties: 1. f(x,y) ≥ 0, 2. ∫-∞+∞ ∫-∞+∞ f(x,y) dx dy = 1
- Probability is volume: P(a
cd ∫ab f(x,y) dx dy.
Joint c.d.f.
F(x, y) = P(X ≤ x, Y ≤ y)
- Discrete: F(x, y) = Σs≤x Σt≤y p(s, t)
- Continuous: F(x, y) = ∫-∞y ∫-∞x f(s, t) ds dt
- We can get the joint p.d.f. from the c.d.f.: f(x,y) = ∂²F(x,y) / (∂x ∂y).
Marginal Distributions
The marginal distribution of X is the individual probability distribution of X, "ignoring" Y. We get it by "summing out" or "integrating out" the other variable from the joint distribution.
- Marginal p.m.f. for X (Discrete): pX(x) = P(X=x) = Σy p(x, y)
Think of this as summing across the rows in a joint probability table.
- Marginal p.d.f. for X (Continuous): fX(x) = ∫-∞+∞ f(x, y) dy
This gives the individual p.d.f. for X. The same logic applies for finding the marginal distribution of Y (sum/integrate over x).
Conditional Distributions
The conditional distribution describes the probability of one variable *given that we know* the value of the other. It's like taking a "slice" of the joint distribution.
- Conditional p.m.f. of Y given X=x: p(y | x) = P(Y=y | X=x) = P(X=x, Y=y) / P(X=x) = p(x, y) / pX(x)
- Conditional p.d.f. of Y given X=x: f(y | x) = f(x, y) / fX(x)
f(x, y) = f(y | x) * fX(x) and f(x, y) = f(x | y) * fY(y)
This is just a rearrangement of the conditional formula and is extremely useful in proofs.
1.6 Independence of Variables
Definition: Two random variables X and Y are independent if and only if their joint distribution function factors into the product of their individual marginal distribution functions.
- For all (x, y):
- Joint c.d.f.: F(x, y) = FX(x) * FY(y)
- Joint p.m.f./p.d.f.: f(x, y) = fX(x) * fY(y)
If X and Y are independent, then the conditional distribution is equal to the marginal distribution:
f(y | x) = f(x, y) / fX(x) = (fX(x) * fY(y)) / fX(x) = fY(y)
This makes intuitive sense: if the variables are independent, knowing the value of X gives you no new information about Y.
1.7 Bivariate Transformations
This extends the univariate case. We have (X, Y) and want to find the joint p.d.f. of new variables, U and V, where:
U = g₁(X, Y) and V = g₂(X, Y)
The Jacobian Method (Change of Variables)
- Define the transformations: U = g₁(X, Y) and V = g₂(X, Y).
- Solve for the inverse functions: X = h₁(U, V) and Y = h₂(U, V).
- Calculate the Jacobian determinant (J) of the inverse transformation. This is the determinant of a matrix of partial derivatives:
- The new joint p.d.f. for U and V is:
You must also transform the domain (the range of possible x, y values) into the new domain for u, v.
You will find that U and V are independent, where U is a Gamma variable and V is a Beta variable. This is a very common and important transformation.