Unit 3: Descriptive Statistics in Excel
Table of Contents
1. Introduction to Descriptive Statistics
Descriptive Statistics: A set of methods used to summarize and describe the main features of a dataset. It does not make inferences or predictions, it simply *describes* what the data shows.
It is typically broken down into two categories:
- Measures of Central Tendency: These find the "center" or "average" of the data (e.g., Mean, Median, Mode).
- Measures of Dispersion (or Variability): These describe the "spread" or "scatter" of the data (e.g., Variance, Standard Deviation, Range).
2. Measures of Central Tendency (Using Functions)
Excel has built-in functions for all major measures of central tendency. Assume your data is in the range A2:A101.
Mean (Arithmetic Average)
The sum of all values divided by the count of values.
Interpretation: The "balancing point" of the data. It is sensitive to outliers.
Median
The middle value when the data is sorted. 50% of data is above it, 50% is below.
Interpretation: The "true middle" of the data. It is not sensitive to outliers, making it a "robust" measure.
Mode
The value that occurs most frequently in the dataset.
Interpretation: The most "typical" or "popular" value. A dataset can have no mode or multiple modes (=MODE.MULT).
- If Mean ≈ Median: The data is likely symmetric.
- If Mean > Median: The data is positively skewed (pulled right by high outliers).
- If Mean < Median: The data is negatively skewed (pulled left by low outliers).
3. Measures of Dispersion (Using Functions)
These functions measure how spread out your data is. Assume data in A2:A101.
Range
The difference between the maximum and minimum values.
Interpretation: The total spread of the data. Very sensitive to outliers.
Variance
The average of the squared differences from the Mean. It measures the average spread.
=VAR.P(A2:A101)
Interpretation: Hard to interpret directly because its units are squared (e.g., "squared dollars"). Its main use is to calculate the standard deviation.
Standard Deviation
The square root of the variance. This is the most common measure of spread, expressed in the same units as the data.
=STDEV.P(A2:A101)
Interpretation: A "typical" or "average" distance of a data point from the mean.
- A small STDEV means data is tightly clustered around the mean (consistent).
- A large STDEV means data is very spread out (inconsistent).
4. Introduction to the Data Analysis ToolPak
The Analysis ToolPak is a powerful Excel add-in that can perform complex statistical analysis with a few clicks, generating a full report.
How to Enable the ToolPak (One-time setup):
- Go to File > Options.
- Click Add-ins on the left.
- At the bottom, in the Manage: box, select Excel Add-ins and click Go...
- Check the box for Analysis ToolPak and click OK.
- The Data Analysis button will now appear on your Data tab.
Using the "Descriptive Statistics" Tool:
- Go to the Data tab and click Data Analysis.
- Select Descriptive Statistics from the list and click OK.
- Input Range: Select your data column (e.g.,
A1:A101). - Labels in first row: Check this box if you selected the header.
- Output Range: Select a single empty cell where you want the report to start.
- Check the box: Summary statistics.
- Click OK.
5. Interpretation of Descriptive Statistics
When the ToolPak generates its report, you must be able to interpret it.
| Statistic | Interpretation |
|---|---|
| Mean | The "average" value. |
| Standard Error | A measure of how accurate the sample mean is as an estimate of the population mean. (Smaller is better). |
| Median | The 50th percentile, or "middle" value. Not sensitive to outliers. |
| Mode | The most frequent value. (#N/A means no mode). |
| Standard Deviation | The average spread around the mean. (Use this to understand variability). |
| Sample Variance | The square of the standard deviation. |
| Kurtosis | Measures "peakedness". (This is Excess Kurtosis, so 0 is normal, >0 is peaked, <0 is flat). |
| Skewness | Measures asymmetry. (0 is symmetric, >0 is positive skew, <0 is negative skew). |
| Range | Total spread (Max - Min). |
| Minimum | The smallest value in the dataset. |
| Maximum | The largest value in the dataset. |
| Sum | The total of all values. |
| Count | The number of observations (n). |
Key Analytical Questions:
- Central Location: What is the "typical" value? (Look at Mean and Median).
- Spread: How consistent is the data? (Look at Standard Deviation).
- Shape: Is the data symmetric or skewed? (Compare Mean vs. Median, or look at the Skewness value).
- Outliers: Are the Min and Max values very far from the Mean? Does the Mean differ greatly from the Median?