Define the terms 'population' and 'sample'. Give examples.
Define nominal and ordinal data citing examples.
Grouped vs Ungrouped frequency distributions, their uses, and considerations for class intervals.
Ungrouped: Data listed individually with their respective frequencies
. Use: Suitable for small datasets with limited ranges.Grouped: Data organized into classes or intervals (e.g., 0-10, 10-20)
. Use: Essential for large datasets to make them concise and manageable.Considerations for Class Intervals:
Method of constructing histogram and frequency polygon.
Histogram:
Frequency Polygon:
Meaning and measures of central tendency.
Central tendency represents a single value that attempts to describe a set of data by identifying the central position within that set
.Measures: Arithmetic Mean, Median, Mode, Geometric Mean, and Harmonic Mean
.Define Arithmetic Mean (AM). Show effect of change of origin/scale and sum of deviations.
Definition: The sum of all observations divided by the total number of observations
.Change of Origin and Scale: If Y = (X - A) / h, then Mean(Y) = (Mean(X) - A) / h. This proves AM is affected by both.
Sum of Deviations: The sum of (Xi - Mean) is always zero
.Sum(x - x_bar) = Sum(x) - Sum(x_bar) = n*x_bar - n*x_bar = 0
Standard Deviation (SD) and effects of origin/scale.
Definition: The positive square root of the arithmetic mean of the squares of deviations of the observations from their AM
.Origin/Scale Effect:
Karl Pearson's correlation coefficient and effects of origin/scale.
It measures the degree of linear relationship between two variables
.Effect: It is independent of change of origin and scale
.Prove r is independent of change of origin/scale and discuss r=0.
Correlation coefficient 'r' remains the same even if variables are transformed as U = (X-a)/h and V = (Y-b)/k
.Independent Variables: If X and Y are independent, r = 0
. However, the converse is not always true; for example, in a perfect quadratic relationship (Y = X²), r might be 0 even though they are perfectly dependent.Regression coefficient, properties, and why two lines of regression?
Properties:
Why two lines? We have two lines because the "dependent" and "independent" variables switch roles. One line minimizes the sum of squares of vertical deviations (Y on X), and the other minimizes horizontal deviations (X on Y).
Axiomatic definition of probability and proofs.
Probability is a set function P(A) satisfying: (1) P(A) >= 0, (2) P(S) = 1, (3) For disjoint events, P(Union Ai) = Sum P(Ai)
.Proofs:
Four cards drawn from 52 (assumed "pack of 5" is a typo in source). Probability of Ace, King, Queen, Jack.
Total ways: 52C4
.Multiplication theorem of probability.
For two events A and B: P(A ∩ B) = P(A) * P(B|A) = P(B) * P(A|B)
. If independent, P(A ∩ B) = P(A) * P(B).Problem solving probability for students A, B, C (1/4, 1/2, 3/4).
Probability problem is solved = 1 - (Probability none solve it)
.P(Solved) = 1 - (3/4 * 1/2 * 1/4) = 1 - 3/32 = 29/32
.Bayes' Theorem Application (Urns I, II, III).
Let E be the event of drawing 1 White and 1 Red ball
.Assume P(I)=P(II)=P(III)=1/3. Use Bayes' Formula: P(I|E) = P(I)P(E|I) / Sum[P(i)P(E|i)].