FYUG Even Semester Exam, 2024
STATISTICS (2nd Semester)
Statistical Data Analysis using R

Course No: STASEC-151T
Full Marks: 50 | Time: 2 Hours
Note: Comprehensive solutions for Section A, B, and C.

SECTION-A (1 x 15 = 15 Marks)

1. Use of with() command in R.

The with() function applies an expression to a dataset, allowing you to reference column names directly without using the $ operator.

2. Use of by() command in R.

The by() function is used to apply a function to a data frame split by factors (similar to a grouped "apply" operation).

3. One feature of R.

R is an open-source programming language specifically designed for statistical computing and graphics.

4. How to remove duplicate rows in R?

Use the unique() function or distinct() from the dplyr package.

5. Which package is used to remove duplicate rows?

The dplyr package is commonly used (with the distinct() function).

6. Use of scan() function in R.

The scan() function is used to read data into a vector or list from the console or a file.

7. How to add labels in a plot?

Labels are added using arguments like xlab (x-axis), ylab (y-axis), and main (title) within plotting functions.

8. How to export a plot in R?

Plots can be exported by opening a device (e.g., png(), pdf()), plotting, and then closing it with dev.off().

9. What is a normal probability plot?

A graphical technique (like a Q-Q plot) used to assess whether a dataset follows a normal distribution.

10. Which function is used to draw a QQ plot?

The qqnorm() and qqline() functions.

11. Relation between median and percentile.

The median is equivalent to the 50th percentile.

12. Function to find variance in R.

The var() function.

13. Package used to compute skewness.

The moments or e1071 packages.

14. Correlation coefficient lies between ______.

-1 and +1.

15. Function to compute covariance.

The cov() function.

SECTION-B (2 x 5 = 10 Marks)

21. What is R? Write two advantages.

R is a language and environment for statistical computing and graphics.

  • Advantage 1: It is free and open-source with a massive community.
  • Advantage 2: It has excellent data visualization capabilities (ggplot2, etc.).

23. Define xlab, ylab, col, main in R.

These are graphical parameters used in visualization:

  • xlab: Label for the x-axis.
  • ylab: Label for the y-axis.
  • col: Sets the color of the plot elements.
  • main: Sets the main title of the plot.

24. Define Boxplot with explaining its parts.

A boxplot summarizes data using a "box and whiskers" approach:

  • Median: The line inside the box.
  • Box: Represents the Interquartile Range (IQR) from Q1 to Q3.
  • Whiskers: Lines extending to the min/max values (within 1.5*IQR).
  • Outliers: Points plotted individually beyond the whiskers.

26. R-code for 60th and 85th percentile (Data: 5, 7, 9, 15, 23, 10).

data <- c(5, 7, 9, 15, 23, 10) quantile(data, probs = c(0.60, 0.85)) # Result: 60th and 85th percentiles

29. R-code for computing range (Data: 8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11).

vals <- c(8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11) diff(range(vals)) # Returns the single range value (Max - Min)

SECTION-C (5 x 5 = 25 Marks)

33. R-code for Bar, Pie, Histogram, and Frequency Polygon.

# (a) Bar Diagram barplot(data_vector) # (b) Pie Diagram pie(data_vector) # (c) Histogram hist(data_vector) # (d) Frequency Polygon h <- hist(data_vector, plot=F) lines(h$mids, h$counts)

35. R-code for Mean, Median, Variance, SD, and SE.

Data: 7, 3, 5, 9, 22, 7, 11, 13, 15, 10

x <- c(7, 3, 5, 9, 22, 7, 11, 13, 15, 10) mean(x) median(x) var(x) sd(x) se <- sd(x) / sqrt(length(x))

37. R-code for Karl Pearson and Spearman Correlation.

Data X: 10, 15, 7, 13, 5, 17, 12, 21, 11
Data Y: 80, 120, 45, 33, 37, 11, 15, 21

# Pearson cor(x, y, method = "pearson") # Spearman Rank cor(x, y, method = "spearman")

40. Simple vs Multiple Linear Regression Model.

Simple Linear Regression: Models the relationship between one dependent variable (Y) and one independent variable (X). Formula: y = a + bx.

Multiple Linear Regression: Models the relationship between one dependent variable and two or more independent variables.

R-code for fitting y = a + bx:

y <- c(7, 5, 3, 2, 11) x <- c(4, 2, 9, 3, 2) model <- lm(y ~ x) coef(model) # Returns Intercept (a) and Slope (b)