DSC-152 LAB: Descriptive Statistics and Probability Distributions

Table of Contents

1. Course Details

2. Learning Objectives

The main goals of this practical course are:

3. Learning Outcomes

After successfully completing this lab course, you will be able to:

4. List of Practicals (with Notes)

This section details the 16 required practicals for the course.

Practical 1: Graphical representation of data.

Focus: This practical involves creating various statistical graphs to visualize data distributions.

Charts to Master:

Exam Tip: The intersection point of the "Less Than" and "More Than" Ogive curves corresponds to the Median of the distribution on the x-axis.

Practical 2: Problems based on measures of central tendency.

Focus: Calculating the "center" or "average" of a dataset using different methods.

Formulas to Apply:

Practical 3: Problems based on measures of dispersion.

Focus: Calculating the "spread" or "variability" of a dataset.

Formulas to Apply:

Practical 4: Problems based on combined mean and variance and coefficient of variation.

Focus: Combining statistics from two or more groups and comparing their variability.

Formulas to Apply:

A dataset with a lower C.V. is considered more consistent or less variable.

Practical 5: Problems based on moments, skewness and kurtosis.

Focus: Describing the shape of the distribution.

Formulas to Apply:

Practical 6: Fitting of polynomials, exponential curves.

Focus: Using the Principle of Least Squares to find the "best fit" curve for a set of (x, y) data points.

Curves to Fit:

Practical 7: Karl Pearson's correlation coefficient.

Focus: Calculating the strength and direction of the linear relationship between two quantitative variables (x, y).

Formula (r):

r = Cov(x, y) / (σₓ * σᵧ)

Computational Formula:

r = [ nΣxy - (Σx)(Σy) ] / sqrt( [nΣx² - (Σx)²] * [nΣy² - (Σy)²] )

Properties of 'r':

Practical 8: Correlation coefficient for a bivariate frequency distribution.

Focus: Calculating Karl Pearson's 'r' when the data is given in a bivariate frequency table (a "correlation table").

Formula:

r = [ NΣ(f * u * v) - (Σfᵤu)(Σfᵥv) ] / sqrt( [NΣ(fᵤu²) - (Σfᵤu)²] * [NΣ(fᵥv²) - (Σfᵥv)²] )
Correlation is independent of change of origin and scale. This is why we can use the (u, v) coded values to simplify calculations, and the 'r' value will be the same as for the original (x, y) data.

Practical 9: Fitting of lines of regression.

Focus: Finding the "best fit" straight line (y = a + bx) to predict one variable from another.

Two Lines of Regression:

  1. Regression Line of Y on X: (Used to predict Y if X is known)
    (y - y-bar) = bᵧₓ * (x - x-bar)
    • bᵧₓ is the regression coefficient of Y on X.
    • bᵧₓ = Cov(x, y) / σₓ² = r * (σᵧ / σₓ)
  2. Regression Line of X on Y: (Used to predict X if Y is known)
    (x - x-bar) = bₓᵧ * (y - y-bar)
    • bₓᵧ is the regression coefficient of X on Y.
    • bₓᵧ = Cov(x, y) / σᵧ² = r * (σₓ / σᵧ)
Properties:
  • The two lines intersect at the point (x-bar, y-bar).
  • r² = bᵧₓ * bₓᵧ (The geometric mean of the coefficients is 'r').
  • Both coefficients must have the same sign.

Practical 10: Spearman rank correlation with and without ties.

Focus: Calculating the correlation between two variables when the data is ranked (ordinal). It measures the strength of a monotonic relationship (not just linear).

Formulas:

Practicals 11-14: Fitting of Discrete Distributions

(Practical 11: Binomial , 12: Poisson , 13: Negative Binomial , 14: Suitable distribution )

Focus: Given an observed frequency distribution (Oᵢ), find the expected frequencies (Eᵢ) according to a theoretical distribution (e.g., Binomial, Poisson).

General Procedure:

  1. Estimate Parameters:
    • Binomial (n, p): 'n' is usually given. Estimate 'p' by setting the observed mean (x-bar) equal to the theoretical mean (np).
      x-bar = np => p = x-bar / n
    • Poisson (λ): Estimate 'λ' by setting the observed mean (x-bar) equal to the theoretical mean (λ).
      λ = x-bar
  2. Calculate Probabilities:
    • Binomial: Use the p.m.f. P(x) = C(n, x)pˣ(1-p)ⁿ⁻ˣ to find P(0), P(1), P(2), ... P(n).
    • Poisson: Use the p.m.f. P(x) = (e⁻ˡᵃᵐᵇᵈᵃ * λˣ) / x! to find P(0), P(1), P(2), ...
    • Tip for Poisson: Use the recurrence relation: P(x) = P(x-1) * (λ / x). First, calculate P(0) = e⁻ˡᵃᵐᵇᵈᵃ, then P(1) = P(0)*(λ/1), P(2) = P(1)*(λ/2), etc.
  3. Calculate Expected Frequencies (Eᵢ):
    Eᵢ = N * P(x)
    • Where N is the total observed frequency (N = ΣOᵢ).
  4. Compare: Create a table of Observed Frequencies (Oᵢ) and Expected Frequencies (Eᵢ) to see how good the fit is. (Later, a Chi-Square Goodness-of-Fit test is used).
"Fitting a suitable distribution" means you must first decide which one is appropriate.
  • Calculate the observed mean (x-bar) and variance (s²).
  • If x-bar ≈ s², a Poisson distribution is likely a good fit.
  • If s² < x-bar, a Binomial distribution is likely a good fit.

Practicals 15-16: Applications & Fitting of Normal Distribution

(Practical 15: Applications of Normal distribution , 16: Fitting of Normal distribution )

Focus: Using the properties of the Normal curve to find probabilities and fit it to data.

Practical 15: Applications