J3AS: Applied Statistics (2022-2023)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
J3AS: Applied Statistics (2022-2023)
Open Book Handwritten Test 1 - MAM+ICS
Questions:
1. A market research company collected data from a restaurant in Jinan University Panyu campus to study the relation between the number of bowls of noodle soup sold in a day (variable y = soup) and the temperature (in °C) outside the restaurant that day (variable x = temp). A simple linear regression model was fitted to these data using R. The edited results in R are shown below, along with some summary statistics and with the relevant scatterplot.
> fml <- lm(soup 〜 temp)
> summary(fml)
Signif.codes:0 '***’ 0.001 0.01 0.05 0.1 ' ' 1
Residual standard error: on 98 degrees of freedom
Multiple R-squared: 0.8254, Adjusted R-squared: 0.8236
F-statistic: 463.3 on 1 and 98 DF, p-value: < 0.0001
> mean(temp)
[1] 58.10
> sd(temp)
[1] 19.10
(a) How many days were used in the sample? [2 marks]
(b) Write down the model being fitted as fm1 algebraically, carefully define each symbol and making clear any distributional assumptions. [6 marks]
(c) Explain in words what the interpretation of the parameter estimate for temp is. [4 marks]
(d) Today the temperature is 30.2°C. Predict the number of bowls of noodle soup that will be sold today at the restaurant in campus. [4 marks]
(e) What is the estimated correlation between temp and soup? [4 marks]
(f) Calculate y, the mean number of bowls of noodle soup, and ay, the standard deviation for the number of bowls of noodle soup among the observations. [10 marks]
(g) Calculate the standard error of the residuals around the simple regression line for these data. [14 marks]
(h) Provide a different transformation on y = soup and x = temp to make a better regression model. Provide 1-3 sentences of justification for why this may be a better choice for a transformation. [6 marks]
2. (a) Explain two similarities and two difference between Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA). Use one or two sentences for each case. [8 marks]
(b) Suppose we have a two-class setup with classes 0 and 1 , i.e., y 2 {0,1}, and a two dimensional predictor variable x = (xi,X2)丁. We find that the means of the two groups are jlQ = (—1, —1)T and 0 1 = (1,1)丁 respectively. The estimated prior class probabilities 7Tq and 金 1 are equal. Suppose that we model each class with its own covariance matrix. We estimate the covariance matrices for class 0 as,
and for class 1 as
蚓 1(/50).
Find the decision boundary and draw a sketch of it in the two-dimensional plane. Show clearly on the plot the regions where the classifier will predict class 1 and where it will predict class 0. [22 marks]
3. Suppose we collect data from a group of students in a Machine learning class with variables x1 = hours studied, X2 = grade point average (GPA), and y = a binary output if that student received mark 5 in the class (y = 1) or not (y = 0). We learn a logistic regression model,
Pr (y = 1 1 x =(X1,X2)) = 1 + e0o+01X1+02X2,
with parameters % = —6,01 = 0.05,月2 = 1.
(a) Estimate the probability according to the logistic regression model that a student who studies for 40 hours and has the grade point average of 3.5 gets a 5 in the Machine learning class. [8 marks]
(b) According to the logistic regression model, how many hours would the student in part (a) need to study to have 50% chance of getting a 5 in the class? [8 marks]
(c) How the bias and variance of the classifier will expect to change if we use QDA to fit the data? [4 marks]
2023-04-15