FYUG Even Semester Exam, 2024

Subject: STATISTICS (Course No: STASEC-151T)

Paper Name: Statistical Data Analysis using R

Full Marks: 50

Time Duration: 2 Hours

Pass Marks: 20

SECTION-A: Objective Questions

Answer any fifteen questions (15 x 1 = 15 Marks)

1. Use of with() command 1 Mark

The with() command is used to evaluate an R expression within an environment constructed from a data frame, allowing you to reference column names directly without using the $ sign.

2. Use of by() command 1 Mark

The by() command is a wrapper for tapply applied to data frames, used to apply a function to a data frame split by factors.

3. Feature of R 1 Mark

One primary feature of R is that it is open-source software, allowing for extensive community contribution and a vast library of packages.

4. Remove duplicate rows 1 Mark

Duplicate rows can be removed using the unique() function in base R or distinct() in the dplyr package.

5. Package for removing duplicate rows 1 Mark

The dplyr package is the most commonly used package for removing duplicate rows in R.

6. Use of scan() function 1 Mark

The scan() function is used to read data into a vector or list from the console or an external file.

7. Adding labels in a plot 1 Mark

Labels are added using arguments like main (title), xlab (x-axis), and ylab (y-axis) inside the plot() function.

8. Exporting a plot 1 Mark

A plot is exported by opening a graphical device (e.g., pdf() or png()), generating the plot, and then closing the device with dev.off().

9. Normal Probability Plot 1 Mark

A normal probability plot is a graphical technique to assess whether a dataset is approximately normally distributed.

10. Function for QQ plot 1 Mark

The qqnorm() function is used to produce a normal QQ plot.

11. Relation between Median and Percentile 1 Mark

The Median is equal to the 50th Percentile.

12. Function for Variance 1 Mark

The var() function is used to find variance in R.

13. Package for Skewness 1 Mark

The moments package is used to compute skewness.

14. Correlation Coefficient Range 1 Mark

The correlation coefficient lies between -1 and +1.

15. Function for Covariance 1 Mark

The cov() function is used to compute covariance.

SECTION-B: Short Answer Questions

Answer any five questions (5 x 2 = 10 Marks)

21. What is R and its advantages? 2 Marks

R is a programming language and free software environment for statistical computing and graphics.
Advantages:

Free and open-source software.

Comprehensive library of packages for almost every statistical task.

24. Define Boxplot 2 Marks

A boxplot (whisker plot) displays the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.

26. R-code for 60th and 85th Percentile 2 Marks

For data: 5, 7, 9, 15, 23, 10

data <- c(5, 7, 9, 15, 23, 10)
quantile(data, probs = c(0.60, 0.85))
            

29. R-code for Range 2 Marks

For data: 8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11

x <- c(8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11)
range_val <- max(x) - min(x)
print(range_val)
            

SECTION-C: Descriptive & Numerical Questions

Answer any five questions (5 x 5 = 25 Marks)

33. R-code for Data Visualization 5 Marks

Codes for different plot types:

# (a) Bar diagram
barplot(height_vector)

# (b) Pie diagram
pie(slices_vector)

# (c) Histogram
hist(data_vector)

# (d) Frequency polygon
h <- hist(data_vector, plot=FALSE)
plot(h$mids, h$counts, type="b")
            

35. R-code for Summary Statistics 5 Marks

For data: 7, 3, 5, 9, 22, 7, 11, 13, 15, 10

x <- c(7, 3, 5, 9, 22, 7, 11, 13, 15, 10)
mean_val <- mean(x)
median_val <- median(x)
variance_val <- var(x)
sd_val <- sd(x)
se_val <- sd(x) / sqrt(length(x))

# Output:
list(Mean=mean_val, Median=median_val, Var=variance_val, SD=sd_val, SE=se_val)
            

Formulas Used:

Mean: x̄ = (Σx)/n
Variance: s² = Σ(x-x̄)²/(n-1)
SD: s = √s²
Standard Error: SE = s/√n

40. Simple Linear Regression Model 5 Marks

Simple Linear Regression: Models the relationship between one dependent variable (y) and one independent variable (x).
Multiple Linear Regression: Models one dependent variable using two or more independent variables.

For model y = a + bx with data x: 4, 2, 9, 3, 2 and y: 7, 5, 3, 2, 11:

y <- c(7, 5, 3, 2, 11)
x <- c(4, 2, 9, 3, 2)
model <- lm(y ~ x)
print(coef(model)) # Returns Intercept (a) and Slope (b)
            

Exam Strategy & Tips

Precision in Code: Ensure parentheses are closed and vectors are defined using c().
Marks Weightage: For Section-C, always provide both the theory/formulas and the R-code.
Missing Packages: In the exam, mention if a library needs to be loaded (e.g., library(moments) for skewness).

Important Formulas List

Standard Error: SE = SD / sqrt(n)
Range: Max - Min

Weighted Mean: Σ(w*x) / Σw