Unit 2: Measures of Central Tendency and Dispersion

Table of Contents

This unit focuses on summarizing data using numerical values. We look at measures for the "center" of the data and measures for the "spread" of the data.

2.1 Measures of Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set.

Arithmetic Mean (A.M.)

The "average" you are most familiar with. It's the sum of all values divided by the number of values.

Median

The middle value of a dataset that has been sorted in order of magnitude.

Mode

The value that appears most frequently in the dataset.

Empirical Relationship (for unimodal, moderately skewed distributions):
Mean - Mode ≈ 3 * (Mean - Median)

For a positively (right) skewed distribution: Mean > Median > Mode

For a negatively (left) skewed distribution: Mean < Median < Mode

Geometric Mean (G.M.)

The n-th root of the product of n values. It is suitable for averaging ratios, percentages, or growth rates.

G.M. = (x₁ * x₂ * ... * xₙ)¹/ⁿ
Cannot be used if any data value is zero or negative.

Harmonic Mean (H.M.)

The reciprocal of the arithmetic mean of the reciprocals. It is suitable for averaging rates and speeds.

H.M. = n / ( Σ(1/x) )
Cannot be used if any data value is zero.

2.2 Partition Values

These are values that divide a sorted dataset into equal parts.

The formula for any partition value in grouped data is a generalization of the median formula. For example, to find Q₁:

Q₁ = L + [ (N/4 - C) / f ] * h

2.3 Measures of Dispersion

These measures describe the spread, variability, or scatter of the data. A low value means the data is clustered tightly around the center, while a high value means it is spread out.

Range

The simplest measure. Range = Maximum Value - Minimum Value.

Quartile Deviation (Q.D.)

Also called the Semi-Interquartile Range. It measures the spread of the middle 50% of the data.

Q.D. = (Q₃ - Q₁) / 2

Mean Deviation (M.D.)

The average of the absolute differences between each data point and the mean (or median).

M.D. (about mean) = ( Σ |x - x-bar| ) / n

Variance (σ²) and Standard Deviation (σ)

The most important and widely used measures of dispersion.

Computational Formula for Variance:

σ² = [ (Σfx²) / N ] - (x-bar)²

(Average of the squares) - (Square of the average)

2.4 Relative Measures of Dispersion

The measures above (Range, SD) are absolute and are in the same units as the data. To compare the variability of two different datasets (e.g., heights in cm vs. weights in kg), we need relative measures (unit-free coefficients).

Coefficient of Variation (C.V.)

The most important relative measure. It expresses the standard deviation as a percentage of the mean.

C.V. = (Standard Deviation / |Mean|) * 100
Uses of C.V.:

Other Coefficients:

2.5 Moments

Moments are statistical measures that describe the characteristics of a distribution's shape.

Raw Moments (μ'ᵣ)

Moments about the origin (zero). The r-th raw moment is E[Xʳ].

Central Moments (μᵣ)

Moments about the mean. The r-th central moment is E[(X - μ)ʳ].

2.6 Measures of Skewness and Kurtosis

Skewness (Asymmetry)

Skewness measures the lack of symmetry in a distribution. [Image of three distributions: negatively skewed (left tail), symmetric (bell curve), and positively skewed (right tail)]

Measures of Skewness:

Kurtosis (Peakedness)

Kurtosis measures the "tailedness" and "peakedness" of a distribution compared to a Normal (bell-shaped) distribution.

Measure of Kurtosis: