MATH38141 Regression Analysis - Coursework 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MATH38141 Regression Analysis - Coursework
This coursework accounts for 30% of your overall mark for this course and it may take around 10 hours to complete . Please present your solution in the form of a report which you should upload on Blackboard as a single file before the deadline . You can use R to do your calculations, but you must show the formulae (not as R code) that you have used to calculate in the text . Marks will be awarded for correct and accurate calculations and their interpretation . Interpretations should be made using the context instead of the generic symbols only. The marks will be more difficult to obtain if the presentation of the results is not e↵ective or if any formulas used in the calculations are missing from the text .
Submit your solution as a single file to Blackboard by 6 pm on November 21, 2022.
1. The taste of matured cheese is related to the concentration of several chemicals in the final product . In a study of cheddar cheese from the La Trobe Valley of Victoria, Australia, samples of cheese were analyzed for their chemical composition and were subjected to taste tests . The data set consists of 30 samples of mature cheddar cheese . Observations were made on 4 variables:
• Taste - subjective taste test score,
• Acetic - natural logarithm of concentration of acetic acid,
• H2S - natural logarithm of concentration of hydrogen sulfide,
• Lactic - concentration of lactic acid .
An EXCEL spreadsheet containing the above data is available on Blackboard .
(a) Draw scatterplots of Taste against each of the other three variables . Describe any
observable trends in your plots .
(b) Formulate a multiple linear regression model for the dataset, using Taste as the
response and the remaining three variables as regressors .
(c) Calculate LSEs and construct 95% confidence intervals for all regression coefficients .
(d) Give interpretations of the estimated coefficients obtained in (c) . (e) Calculate and interprete the R2 statistic for the model .
(f) It is argued that when fitting a multiple linear regression model to the data, using Taste as the response and the other factors as the explanatory variables, the intercept term β0 should be set to zero . Is this argument reasonable? Why?
(10 marks)
It is well accepted that the chemicals ‘H2S’ and ‘lactic acid’ contribute significantly to the good taste of cheddar cheese . To investigate whether ‘acetic acid’ also a↵ects the taste of cheddar cheese, two multiple linear regression models were fitted to the ‘taste’ data, yielding the following results:
Explanatory variables
H2S, lactic |
acetic, H2S, lactic |
(g) Decide which one is the reduced model . Then fill in the following ANOVA table to
compare the nested models .
(h) Calculate the p-value associated with the significance of ‘acetic acid’ . Do you think
‘acetic acid’ should be included in the multiple linear regression model?
(i) Regressing ‘taste’ on ‘acetic acid’ alone and test at the 5% level for the significance of ‘acetic acid’ under this simple linear regression model . Does your conclusion con- tradict that given in (h)? Comment .
(6 marks)
2. A dataset concerns the price per capita of beef annually from 1925 to 1941 together with other variables relevant to an economic analysis of the price of beef. It contains the following variables:
• YEAR = Year to which the data refer;
• PFO = Retail food price index;
• DINC = Disposable income per capita index;
• CFO = Food consumption per capita index;
• RDINC = Index of real disposable income per capita;
• RFP = Retail food price index adjusted by the CPI;
• PBE = Price of beef (cents/lb) .
An EXCEL spreadsheet containing the above data is available on Blackboard .
A multiple linear regression model ⌦ is proposed to describe the relationship between the response variable PBE and the other 6 explanatory variables (YEAR, PFO, DINC, CFO, RDINC, RFP) .
An agriculturalist believes, however, that the variation in PBE can be adequately explained by the variable CFO alone, and hence proposes a simple linear regression model ! for the data .
(a) Specify the models ⌦ and !, and state the model assumptions clearly. (b) Calculate the residual sums of squares fitting ⌦ and ! respectively.
(c) Explain why in (b) the residual sum of square of ⌦ is not larger than that of ! .
(d) Under model ⌦, test whether the regression coefficient of DINC is 2 at 10% level and give conclusion .
(e) Suppose that we predict the explanatory variables to have the following figures in the
year 2015:
Year PFO DINC CFO RDINC RFP 2015 200 .0 200 .0 200 .0 220 .0 2000 .0
2018 190 .0 210 .0 210 .0 210 .0 2100 .0
Calcualte the prediction interval of the change of PBE from 2015 to 2018 .
(f) It is suggested that the changes in the relationships between PBE and CFO depends on the year when the data are collected, i .e . the variable YEAR . Answer the following questions .
(1) Propose a suitable model where model ! is nested in and explain why it is suitable .
(2) Denote the proposed model in (1) above by ⌦ 1 . Carry out a hypothesis test to compare ! against ⌦ 1 and make conclusion .
(3) Based on the fitted model ⌦1 , plot four fitted regression lines on the same diagram to display the relationships between PBE and CFO in the years 1925, 1930, 1935 and 1940, respectively.
Comment on the changes in the relationship between PBE and CFO across the period 1925– 1940, i .e . years leading to the Second World War .
(14 marks)
2022-11-19