闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MAT3375: Midterm Examination

Date: Thursday October 20, 2022

1. Consider a simple linear regression model: y = βo + β1 xi + ei . (10 points)

a. List the Gauss-Markov conditions. (3 points)

b. Define residuals, and provide explanations/descriptions of how we expect the residuals to behave, when the Gauss-Markov conditions are satisfied. (2 points)

c. Suppose Gauss-Markov conditions are satisfied. Show that βˆo and βˆ1 (formulas given below) are unbiased estimators of the true parameters βo and β1 . (5 points)

βˆo = y¯ − βˆ1 and βˆ1 =

2. A sample data from the Framingham’s Heart Study was analyzed in R to produce the following results. Multiple regression model was fit, where the dependent variable is systolic blood pressure (sbp1) and independent variables age (in years), sex (male=1, female=2) and diabetes status (positive=1, negative=0). (15 points)

Call:

lm(formula = data1$sbp1 ~ data1$age + data1$sex + data1$diabetes)

Residuals:

Min 1Q Median 3Q Max

-25.4684 -7.8459 0.1738 6.5514 31.1936

Coefficients:

(Intercept) data1$age data1$sex2 data1$diabetes1

---

Signif. codes:

Estimate Std. Error t value

13.558 3.955 -0.309 2.621

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’

Pr(>|t|)

< 2e-16 ***

0.000274 ***

0.759090

0.012003 *

0.05 ‘ . ’ 0.1 ‘ ’ 1

Residual standard error: 11.53 on 44 degrees of freedom

Multiple R-squared: 0.4188, Adjusted R-squared: 0.3792

F-statistic: 10.57 on 3 and 44 DF, p-value: 2.34e-05

a. How many number of individuals did we use in the above analysis? (2 points)

b. R2 = 0.4188 represents the proportion of total variation in blood pressure (the dependent variable) that is explained by the regression model. What is the total number of variation? Formula to calculate the total variation (corrected sum of squares) corresponding to blood pressure (y) is given by ∑(yi − y¯)2 (4 points)

c. How do you interpret the estimates of the regression coeﬀicients for age, sex and diabetes status? (3 points)

d. Write down the hypothesis for testing the overall significance of the regression model, and provide a complete description of the test statistic used with its corresponding distribution. Based on the results from our analysis, do we reject the null hypothesis? Justify your results. (2 points)

e. Consider the p − value = 0.000274 corresponding to age. Describe the steps used to calculate this p-value. (2 points)

f. How is the last p-value (p − value : 2.34e − 05) calculated? show your steps. (2 points)

3. Consider multiple linear regression with k independent variables, described using ma- trix formulations as: y = Xβ + ϵ, where y is a column vector of length n; β is a column vector of length k + 1, consisting of the regression coeﬀicients (including the intercept); X is a matrix of dimension n by k + 1 consisting of measurements from the k independent variables and one additional column vector of 1’s corresponding to the intercept; and ϵ is a column vector of length n consisting of the error terms. Sup- pose the error terms are independently and identically distributed according to the normal distribution with mean zero and variance σ2 , i.e ei ∼ N (0, σ2 ). The maximum likelihood estimator of β is given by = (X\X)−1X\y (10 points)