FYUG Even Semester Exam, 2024

Subject: STATISTICS (Course No: STASEC-151T)

SECTION-A: Objective Questions

Answer any fifteen questions (15 x 1 = 15 Marks)

1. Use of with() command 1 Mark

The with() command is used to evaluate an R expression within an environment constructed from a data frame, allowing you to reference column names directly without using the $ sign.

2. Use of by() command 1 Mark

The by() command is a wrapper for tapply applied to data frames, used to apply a function to a data frame split by factors.

3. Feature of R 1 Mark

One primary feature of R is that it is open-source software, allowing for extensive community contribution and a vast library of packages.

4. Remove duplicate rows 1 Mark

Duplicate rows can be removed using the unique() function in base R or distinct() in the dplyr package.

5. Package for removing duplicate rows 1 Mark

The dplyr package is the most commonly used package for removing duplicate rows in R.

6. Use of scan() function 1 Mark

The scan() function is used to read data into a vector or list from the console or an external file.

7. Adding labels in a plot 1 Mark

Labels are added using arguments like main (title), xlab (x-axis), and ylab (y-axis) inside the plot() function.

8. Exporting a plot 1 Mark

A plot is exported by opening a graphical device (e.g., pdf() or png()), generating the plot, and then closing the device with dev.off().

9. Normal Probability Plot 1 Mark

A normal probability plot is a graphical technique to assess whether a dataset is approximately normally distributed.

10. Function for QQ plot 1 Mark

The qqnorm() function is used to produce a normal QQ plot.

11. Relation between Median and Percentile 1 Mark

The Median is equal to the 50th Percentile.

12. Function for Variance 1 Mark

The var() function is used to find variance in R.

13. Package for Skewness 1 Mark

The moments package is used to compute skewness.

14. Correlation Coefficient Range 1 Mark

The correlation coefficient lies between -1 and +1.

15. Function for Covariance 1 Mark

The cov() function is used to compute covariance.

SECTION-B: Short Answer Questions

Answer any five questions (5 x 2 = 10 Marks)

21. What is R and its advantages? 2 Marks

R is a programming language and free software environment for statistical computing and graphics.
Advantages:
  • Free and open-source software.
  • Comprehensive library of packages for almost every statistical task.

24. Define Boxplot 2 Marks

A boxplot (whisker plot) displays the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.

26. R-code for 60th and 85th Percentile 2 Marks

For data: 5, 7, 9, 15, 23, 10
data <- c(5, 7, 9, 15, 23, 10) quantile(data, probs = c(0.60, 0.85))

29. R-code for Range 2 Marks

For data: 8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11
x <- c(8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11) range_val <- max(x) - min(x) print(range_val)

SECTION-C: Descriptive & Numerical Questions

Answer any five questions (5 x 5 = 25 Marks)

33. R-code for Data Visualization 5 Marks

Codes for different plot types:
# (a) Bar diagram barplot(height_vector) # (b) Pie diagram pie(slices_vector) # (c) Histogram hist(data_vector) # (d) Frequency polygon h <- hist(data_vector, plot=FALSE) plot(h$mids, h$counts, type="b")

35. R-code for Summary Statistics 5 Marks

For data: 7, 3, 5, 9, 22, 7, 11, 13, 15, 10
x <- c(7, 3, 5, 9, 22, 7, 11, 13, 15, 10) mean_val <- mean(x) median_val <- median(x) variance_val <- var(x) sd_val <- sd(x) se_val <- sd(x) / sqrt(length(x)) # Output: list(Mean=mean_val, Median=median_val, Var=variance_val, SD=sd_val, SE=se_val)
Formulas Used:
Mean: x̄ = (Σx)/n
Variance: s² = Σ(x-x̄)²/(n-1)
SD: s = √s²
Standard Error: SE = s/√n

40. Simple Linear Regression Model 5 Marks

Simple Linear Regression: Models the relationship between one dependent variable (y) and one independent variable (x).
Multiple Linear Regression: Models one dependent variable using two or more independent variables.

For model y = a + bx with data x: 4, 2, 9, 3, 2 and y: 7, 5, 3, 2, 11:
y <- c(7, 5, 3, 2, 11) x <- c(4, 2, 9, 3, 2) model <- lm(y ~ x) print(coef(model)) # Returns Intercept (a) and Slope (b)