Unit 3: Descriptive Statistics in Excel

1. Introduction to Descriptive Statistics
2. Measures of Central Tendency (Using Functions)
3. Measures of Dispersion (Using Functions)
4. Introduction to the Data Analysis ToolPak
5. Interpretation of Descriptive Statistics

1. Introduction to Descriptive Statistics

Descriptive Statistics: A set of methods used to summarize and describe the main features of a dataset. It does not make inferences or predictions, it simply *describes* what the data shows.

It is typically broken down into two categories:

Measures of Central Tendency: These find the "center" or "average" of the data (e.g., Mean, Median, Mode).
Measures of Dispersion (or Variability): These describe the "spread" or "scatter" of the data (e.g., Variance, Standard Deviation, Range).

2. Measures of Central Tendency (Using Functions)

Excel has built-in functions for all major measures of central tendency. Assume your data is in the range A2:A101.

Mean (Arithmetic Average)

The sum of all values divided by the count of values.

=AVERAGE(A2:A101)

Interpretation: The "balancing point" of the data. It is sensitive to outliers.

Median

The middle value when the data is sorted. 50% of data is above it, 50% is below.

=MEDIAN(A2:A101)

Interpretation: The "true middle" of the data. It is not sensitive to outliers, making it a "robust" measure.

Mode

The value that occurs most frequently in the dataset.

=MODE.SNGL(A2:A101)

Interpretation: The most "typical" or "popular" value. A dataset can have no mode or multiple modes (=MODE.MULT).

Mean vs. Median: A key analysis step is to compare the mean and median.

If Mean ≈ Median: The data is likely symmetric.
If Mean > Median: The data is positively skewed (pulled right by high outliers).
If Mean < Median: The data is negatively skewed (pulled left by low outliers).

3. Measures of Dispersion (Using Functions)

These functions measure how spread out your data is. Assume data in A2:A101.

Range

The difference between the maximum and minimum values.

=MAX(A2:A101) - MIN(A2:A101)

Interpretation: The total spread of the data. Very sensitive to outliers.

Variance

The average of the squared differences from the Mean. It measures the average spread.

Sample vs. Population: Always use the .S (Sample) version unless you are 100% certain you have data for the *entire* population. The .S version divides by (n-1).

=VAR.S(A2:A101)
=VAR.P(A2:A101)

Interpretation: Hard to interpret directly because its units are squared (e.g., "squared dollars"). Its main use is to calculate the standard deviation.

Standard Deviation

The square root of the variance. This is the most common measure of spread, expressed in the same units as the data.

=STDEV.S(A2:A101)
=STDEV.P(A2:A101)

Interpretation: A "typical" or "average" distance of a data point from the mean.

A small STDEV means data is tightly clustered around the mean (consistent).
A large STDEV means data is very spread out (inconsistent).

4. Introduction to the Data Analysis ToolPak

The Analysis ToolPak is a powerful Excel add-in that can perform complex statistical analysis with a few clicks, generating a full report.

How to Enable the ToolPak (One-time setup):

Go to File > Options.
Click Add-ins on the left.
At the bottom, in the Manage: box, select Excel Add-ins and click Go...
Check the box for Analysis ToolPak and click OK.
The Data Analysis button will now appear on your Data tab.

Using the "Descriptive Statistics" Tool:

Go to the Data tab and click Data Analysis.
Select Descriptive Statistics from the list and click OK.
Input Range: Select your data column (e.g., A1:A101).
Labels in first row: Check this box if you selected the header.
Output Range: Select a single empty cell where you want the report to start.
Check the box: Summary statistics.
Click OK.

5. Interpretation of Descriptive Statistics

When the ToolPak generates its report, you must be able to interpret it.

Typical Data Analysis ToolPak Output
Statistic	Interpretation
Mean	The "average" value.
Standard Error	A measure of how accurate the sample mean is as an estimate of the population mean. (Smaller is better).
Median	The 50th percentile, or "middle" value. Not sensitive to outliers.
Mode	The most frequent value. (#N/A means no mode).
Standard Deviation	The average spread around the mean. (Use this to understand variability).
Sample Variance	The square of the standard deviation.
Kurtosis	Measures "peakedness". (This is Excess Kurtosis, so 0 is normal, >0 is peaked, <0 is flat).
Skewness	Measures asymmetry. (0 is symmetric, >0 is positive skew, <0 is negative skew).
Range	Total spread (Max - Min).
Minimum	The smallest value in the dataset.
Maximum	The largest value in the dataset.
Sum	The total of all values.
Count	The number of observations (n).

Key Analytical Questions:

Central Location: What is the "typical" value? (Look at Mean and Median).
Spread: How consistent is the data? (Look at Standard Deviation).
Shape: Is the data symmetric or skewed? (Compare Mean vs. Median, or look at the Skewness value).
Outliers: Are the Min and Max values very far from the Mean? Does the Mean differ greatly from the Median?

Knowlet