Please print your name and ID clearly on the first page of your assignment.

Important: If you must present some R code and output, report them in a concise and presentable fashion (for example, tabulate models and output), and only show the important and crucial exploration and results for answering the questions. Give description and discussion for your important exploration and findings. Please attach all the raw code and output from R at the end of the assignment, as evidence of your independent work, but they will not be marked. Aim to prepare your answers so that it is easy for the readers (in this case, markers) to follow, without needing to search everywhere for your answers from the lengthy code and output attached. 

1. (a) Suppose Y is the response variable in a study, and there is a covariate factor that has 3 levels labeled by 1, 2, 3. Define dummy variables (x1, x2) and a regression model to explain the dependence between the response and the covariate factor. Take level 3 to be the reference level. What parameters or functions of the parameters describe the expected responses of the levels 1, 2, and 3 respectively?
(b) Suppose in addition there is a continuous covariate x3. And the response is now described by the model

where x1, x2 are defined as in (a). Explain in words the meaning of the interaction effects β13 and β23. 

Note added on Nov. 20: Model expression, interaction terms in (b) are updated.

2. (a) For the \cars.txt" data considered in Problem 5 of Assignment 3, create and export a new data file \cars1.txt" to your working directory, in which the wt column in the original file is removed, but a wtlevel column is added. The wtlevel column is created such that those with wt greater than the median weight are labeled by H (for heavy), and all other automobiles by L (for light). Import the new file to R workspace and display the first 6 observations (rows). Complete all the task in R and show the code.

(b) Define a regression model that describes the mpg (miles per gallon) response by hp (horse power, continuous covariate) and the wtlevel factor, and their interaction! Define the light automobile (L) category as the reference level for wtlevel. Fit the model, and interpret the main effects and the interaction effects. Are the effects significantly different from 0 or not?

(c) Use R to draw a scatter plot for mpg against hp, but distinguish the two weight levels by H and L on the plot. Specify for each weight level, how the expected mpg changes as a function of horse power. Plot the functions by adding them to the scatter plot. 

3. (a) For the \Savings.txt" data considered in Problem 5 of Assignment 2, we fitted a model with ratio of savings, sr, as the response, and all other variables (pop15, pop75, dpi, ddpi) as the covariates. The R2 of the fitted model is only around 0.35. Using R, plot the residuals of this model against the fitted values, and add the bands of ±2^ σ to help assess the range (spread) of the residuals. Do you notice any violations of the model assumptions?
Will a Box-Cox transformation (for the response variable) improve the model fit? What is a reasonable
λ value suggested by boxcox() in R, and what is the corresponding type of transformation?

(b) Fit another regression model for the response sr, with pop15 and ddpi covariates only. Define this model and the model in (a) in mathematical forms. Compare the two models by an F test, clearly describe the test procedure (hypotheses, F ratio and distribution, observed statistics, rejection criterion and conclusion). Test at significance level α = 0:05 (the default). 

(c) Instead of considering pop75, now consider a function of pop75 denoted by pop75fn. One regression model (C) has covariates pop15, pop75fn, dpi, ddpi, and another model (C1) has covariates pop15, pop75fn, ddpi. The predicted values by models C and C1 are given in the file \SavingsPred.txt" (2 columns respectively). Can we drop dpi from model C? Answer the question by computing an ANOVA decomposition table (like Table 2 of Section 6.1) and conduct an F-test. Define the models and describe the test procedure. Find also the Rvalues for the two models.

4. The complete data set \carscomplete.txt" for the automobile study has additional columns 

vs
am
gear
carb
Engine (0 = V-shaped, 1 = straight)
Transmission (0 = automatic, 1 = manual)
Number of forward gears
Number of carburetors
other than those given in the \cars.txt" file. Let mpg be the response variable, consider all other variables as potential explanatory variables, keep all variables in their original forms. Choose the \best" model using each of the following methods. For each of your \best" model, describe the model, summarize and report the fit, plot residuals against fitted values and assess the model fit. 
(a) Backward elimination method.
(b) Forward selection method.
Note added on Nov. 20: Please take
SLE = 0:15, SLS = 0:05.