FYUG Even Semester Exam, 2024
STATISTICS (2nd Semester)
Statistical Data Analysis using R
Course No: STASEC-151T
Full Marks: 50 | Time: 2 Hours
Note: Comprehensive solutions for Section A, B, and C.
SECTION-A (1 x 15 = 15 Marks)
1. Use of with() command in R.
The with() function applies an expression to a dataset, allowing you to reference column names directly without using the $ operator
.
2. Use of by() command in R.
The by() function is used to apply a function to a data frame split by factors (similar to a grouped "apply" operation)
.
3. One feature of R.
R is an open-source programming language specifically designed for statistical computing and graphics
.
4. How to remove duplicate rows in R?
Use the unique() function or distinct() from the dplyr package
.
5. Which package is used to remove duplicate rows?
The dplyr package is commonly used (with the distinct() function)
.
6. Use of scan() function in R.
The scan() function is used to read data into a vector or list from the console or a file
.
7. How to add labels in a plot?
Labels are added using arguments like xlab (x-axis), ylab (y-axis), and main (title) within plotting functions
.
8. How to export a plot in R?
Plots can be exported by opening a device (e.g., png(), pdf()), plotting, and then closing it with dev.off()
.
9. What is a normal probability plot?
A graphical technique (like a Q-Q plot) used to assess whether a dataset follows a normal distribution
.
10. Which function is used to draw a QQ plot?
The qqnorm() and qqline() functions
.
11. Relation between median and percentile.
The median is equivalent to the 50th percentile
.
12. Function to find variance in R.
The var() function
.
13. Package used to compute skewness.
The moments or e1071 packages
.
14. Correlation coefficient lies between ______.
-1 and +1
.
15. Function to compute covariance.
The cov() function
.
SECTION-B (2 x 5 = 10 Marks)
21. What is R? Write two advantages.
R is a language and environment for statistical computing and graphics
.
- Advantage 1: It is free and open-source with a massive community
.
- Advantage 2: It has excellent data visualization capabilities (ggplot2, etc.)
.
23. Define xlab, ylab, col, main in R.
These are graphical parameters used in visualization
:
xlab: Label for the x-axis .
ylab: Label for the y-axis .
col: Sets the color of the plot elements .
main: Sets the main title of the plot .
24. Define Boxplot with explaining its parts.
A boxplot summarizes data using a "box and whiskers" approach
:
- Median: The line inside the box
.
- Box: Represents the Interquartile Range (IQR) from Q1 to Q3
.
- Whiskers: Lines extending to the min/max values (within 1.5*IQR)
.
- Outliers: Points plotted individually beyond the whiskers
.
26. R-code for 60th and 85th percentile (Data: 5, 7, 9, 15, 23, 10).
data <- c(5, 7, 9, 15, 23, 10)
quantile(data, probs = c(0.60, 0.85))
29. R-code for computing range (Data: 8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11).
vals <- c(8, 9, 3, 10, 7, 2, 3, 8, 12, 15, 17, 11)
diff(range(vals))
SECTION-C (5 x 5 = 25 Marks)
33. R-code for Bar, Pie, Histogram, and Frequency Polygon.
barplot(data_vector)
pie(data_vector)
hist(data_vector)
h <- hist(data_vector, plot=F)
lines(h$mids, h$counts)
35. R-code for Mean, Median, Variance, SD, and SE.
Data: 7, 3, 5, 9, 22, 7, 11, 13, 15, 10
x <- c(7, 3, 5, 9, 22, 7, 11, 13, 15, 10)
mean(x)
median(x)
var(x)
sd(x)
se <- sd(x) / sqrt(length(x))
37. R-code for Karl Pearson and Spearman Correlation.
Data X: 10, 15, 7, 13, 5, 17, 12, 21, 11
Data Y: 80, 120, 45, 33, 37, 11, 15, 21
cor(x, y, method = "pearson")
cor(x, y, method = "spearman")
40. Simple vs Multiple Linear Regression Model.
Simple Linear Regression: Models the relationship between one dependent variable (Y) and one independent variable (X)
.
Formula: y = a + bx.
Multiple Linear Regression: Models the relationship between one dependent variable and two or more independent variables
.
R-code for fitting y = a + bx:
y <- c(7, 5, 3, 2, 11)
x <- c(4, 2, 9, 3, 2)
model <- lm(y ~ x)
coef(model)