Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


J3AS: Applied Statistics (2022-2023)

Open Book Handwritten Test 1 - MAM+ICS

Questions:

1. A market research company collected data from a restaurant in Jinan University Panyu campus to study the relation between the number of bowls of noodle soup sold in a day (variable y = soup) and the temperature (in °C) outside the restaurant that day (variable x = temp). A simple linear regression model was fitted to these data using R. The edited results in R are shown below, along with some summary statistics and with the relevant scatterplot.

> fml <- lm(soup temp)

> summary(fml)

Signif.codes:0 '***’ 0.001 0.01 0.05 0.1 ' ' 1

Residual standard error:  on 98 degrees of freedom

Multiple R-squared: 0.8254, Adjusted R-squared: 0.8236

F-statistic: 463.3 on 1 and 98 DF, p-value: < 0.0001

> mean(temp)

[1] 58.10

> sd(temp)

[1] 19.10

 

(a) How many days were used in the sample? [2 marks]

(b) Write down the model being fitted as fm1 algebraically, carefully define each symbol and making clear any distributional assumptions. [6 marks]

(c) Explain in words what the interpretation of the parameter estimate for temp is. [4 marks]

(d) Today the temperature is 30.2°C. Predict the number of bowls of noodle soup that will be sold today at the restaurant in campus. [4 marks]

(e) What is the estimated correlation between temp and soup? [4 marks]

(f) Calculate y, the mean number of bowls of noodle soup, and ay, the standard deviation for the number of bowls of noodle soup among the observations. [10 marks]

(g) Calculate the standard error of the residuals around the simple regression line for these data. [14 marks]

(h) Provide a different transformation on y = soup and x = temp to make a better regression model. Provide 1-3 sentences of justification for why this may be a better choice for a transformation. [6 marks]

2. (a) Explain two similarities and two difference between Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA). Use one or two sentences for each case. [8 marks]

(b) Suppose we have a two-class setup with classes 0 and 1 , i.e., y 2 {0,1}, and a two dimensional predictor variable x = (xi,X2)丁. We find that the means of the two groups are jlQ = (—1, —1)T and 0 1 = (1,1) respectively. The estimated prior class probabilities 7Tq and  1 are equal. Suppose that we model each class with its own covariance matrix. We estimate the covariance matrices for class 0 as,

and for class 1 as

1(/50).

Find the decision boundary and draw a sketch of it in the two-dimensional plane. Show clearly on the plot the regions where the classifier will predict class 1 and where it will predict class 0. [22 marks]

3. Suppose we collect data from a group of students in a Machine learning class with variables x1 = hours studied, X2 = grade point average (GPA), and y = a binary output if that student received mark 5 in the class (y = 1) or not (y = 0). We learn a logistic regression model,

Pr (y = 1 1 x =(X1,X2)) = 1 + e0o+01X1+02X2

with parameters % = —6,01 = 0.05,月2 = 1.

(a) Estimate the probability according to the logistic regression model that a student who studies for 40 hours and has the grade point average of 3.5 gets a 5 in the Machine learning class. [8 marks]

(b) According to the logistic regression model, how many hours would the student in part (a) need to study to have 50% chance of getting a 5 in the class? [8 marks]

(c) How the bias and variance of the classifier will expect to change if we use QDA to fit the data? [4 marks]