FYUG Even Semester Exam, 2025 STATISTICS (2nd Semester) Course No.: STADSM-151

Paper Name: Statistical Methods and Probability

Course No: STADSM-151

Full Marks: 70 | Pass Marks: 28

Time: 3 Hours

UNIT-I

Question 1(a) [2]

Define the terms 'population' and 'sample'. Give examples.

Population: The entire group of individuals or objects that are the subject of a statistical investigation. Example: All students enrolled in a university.

Sample: A finite subset of individuals selected from the population with the objective of investigating its properties. Example: 100 students selected from the university to study their spending habits.

Question 1(b) [2]

Define nominal and ordinal data citing examples.

Nominal Data: Data used for labeling variables without any quantitative value or order. Example: Gender (Male, Female), Eye Color.

Ordinal Data: Data where the variables have a natural, ordered hierarchy, but the distances between categories are not known. Example: Customer satisfaction ratings (Satisfied, Neutral, Dissatisfied).

Question 2(a) [5]

Grouped vs Ungrouped frequency distributions, their uses, and considerations for class intervals.

Ungrouped: Data listed individually with their respective frequencies. Use: Suitable for small datasets with limited ranges.

Grouped: Data organized into classes or intervals (e.g., 0-10, 10-20). Use: Essential for large datasets to make them concise and manageable.

Considerations for Class Intervals:

The magnitude should preferably be uniform throughout.

The number of classes should ideally be between 5 and 15 to avoid loss of information or over-summarization.

Intervals should be mutually exclusive (no overlap).

Question 2(b) [5]

Method of constructing histogram and frequency polygon.

Histogram:

Mark class boundaries on the X-axis.
Mark frequencies on the Y-axis.

Draw adjacent rectangles where the width represents the class interval and the height represents frequency density.

Frequency Polygon:

Find the mid-point (class mark) of each class interval.
Plot points with mid-points as X-coordinates and frequencies as Y-coordinates.
Join these points with straight lines. Close the polygon by connecting it to the X-axis at class marks of empty classes at both ends.

UNIT-II

Question 3(a) [2]

Meaning and measures of central tendency.

Central tendency represents a single value that attempts to describe a set of data by identifying the central position within that set.

Measures: Arithmetic Mean, Median, Mode, Geometric Mean, and Harmonic Mean.

Question 4(a) [5]

Define Arithmetic Mean (AM). Show effect of change of origin/scale and sum of deviations.

Definition: The sum of all observations divided by the total number of observations.

Change of Origin and Scale: If Y = (X - A) / h, then Mean(Y) = (Mean(X) - A) / h. This proves AM is affected by both.

Sum of Deviations: The sum of (Xi - Mean) is always zero.

Sum(x - x_bar) = Sum(x) - Sum(x_bar) = n*x_bar - n*x_bar = 0

Question 4(c) [5]

Standard Deviation (SD) and effects of origin/scale.

Definition: The positive square root of the arithmetic mean of the squares of deviations of the observations from their AM.

Origin/Scale Effect:

Origin: Independent. If we add a constant 'a' to all values, SD remains unchanged.
Scale: Dependent. If we multiply all values by 'k', the new SD is |k| * old SD.

UNIT-III

Question 5(a) [2]

Karl Pearson's correlation coefficient and effects of origin/scale.

It measures the degree of linear relationship between two variables.

Effect: It is independent of change of origin and scale.

Question 6(a) [5]

Prove r is independent of change of origin/scale and discuss r=0.

Correlation coefficient 'r' remains the same even if variables are transformed as U = (X-a)/h and V = (Y-b)/k.

Independent Variables: If X and Y are independent, r = 0. However, the converse is not always true; for example, in a perfect quadratic relationship (Y = X²), r might be 0 even though they are perfectly dependent.

Question 6(c) [5]

Regression coefficient, properties, and why two lines of regression?

Properties:

The geometric mean of two regression coefficients is the correlation coefficient (r = sqrt(bxy * byx)).
If one coefficient is greater than 1, the other must be less than 1.

Why two lines? We have two lines because the "dependent" and "independent" variables switch roles. One line minimizes the sum of squares of vertical deviations (Y on X), and the other minimizes horizontal deviations (X on Y).

UNIT-IV

Question 7(b) [2]

Axiomatic definition of probability and proofs.

Probability is a set function P(A) satisfying: (1) P(A) >= 0, (2) P(S) = 1, (3) For disjoint events, P(Union Ai) = Sum P(Ai).

Proofs:

P(phi) = 0: Since S and phi are disjoint, P(S U phi) = P(S) + P(phi) => 1 = 1 + P(phi) => P(phi) = 0.
P(not A) = 1 - P(A): Since A and not A are exhaustive and disjoint, P(A U not A) = P(S) => P(A) + P(not A) = 1.

Question 8(b) [5]

Four cards drawn from 52 (assumed "pack of 5" is a typo in source). Probability of Ace, King, Queen, Jack.

Total ways: 52C4.

One of each: (4C1 * 4C1 * 4C1 * 4C1) / 52C4.

Two Kings, Two Queens: (4C2 * 4C2) / 52C4.

Two Black, Two Red: (26C2 * 26C2) / 52C4.

UNIT-V

Question 9(b) [2]

Multiplication theorem of probability.

For two events A and B: P(A ∩ B) = P(A) * P(B|A) = P(B) * P(A|B). If independent, P(A ∩ B) = P(A) * P(B).

Question 10(a) [5]

Problem solving probability for students A, B, C (1/4, 1/2, 3/4).

Probability problem is solved = 1 - (Probability none solve it).

P(not A) = 1 - 1/4 = 3/4
P(not B) = 1 - 1/2 = 1/2
P(not C) = 1 - 3/4 = 1/4

P(Solved) = 1 - (3/4 * 1/2 * 1/4) = 1 - 3/32 = 29/32.

Question 10(d) [5]

Bayes' Theorem Application (Urns I, II, III).

Let E be the event of drawing 1 White and 1 Red ball.

Urn I (1W, 2B, 3R): P(E|I) = (1C1 * 3C1) / 6C2 = 3/15
Urn II (2W, 1B, 1R): P(E|II) = (2C1 * 1C1) / 4C2 = 2/6 = 1/3
Urn III (4W, 5B, 3R): P(E|III) = (4C1 * 3C1) / 12C2 = 12/66

Assume P(I)=P(II)=P(III)=1/3. Use Bayes' Formula: P(I|E) = P(I)P(E|I) / Sum[P(i)P(E|i)].

Exam Focus Enhancements

Exam Tips

For Unit-I diagrams, use a sharp pencil and include proper axis labels (e.g., "Class Boundaries" on X-axis).
In Probability problems, always define your events (A, B, C) clearly before starting calculations.

Important Formulas List

Coefficient of Variation: (Standard Deviation / Mean) * 100%

Mode (Grouped): L + [(f1 - f0) / (2f1 - f0 - f2)] * h

[span_55](start_span)

Regression Coefficient bxy: r * (SD_x / SD_y)[span_55](end_span)

Answer Presentation Strategy

When proving properties (like change of origin), use the "let Y = X - a" substitution method as it is standard and earns full marks. For long questions (5 marks), always include a definition, a formula, and a brief explanation.