Statistics 2 - Assignment
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Statistics 2 - Assignment
NOTE: For Question 1a), you are expected to follow the Data Analysis Algorithm shown on page 1 of Workbook 3 from the EBBRMS course, and ensure you cover both confidence interval and hypothesis testing approaches. Make sure that you remember to check assumptions where possible; state null hypotheses when interpreting p-values; appropriately interpret confidence intervals, and give your interpretation ofthe clinical, scientific or wider relevance ofthe results. To help you get an idea ofthe kind of approach to take, the file in the EBBRMS Moodle Assignment section “EBBRMS Assignment-ExampleQuestionAndAnswer” does this for a t-test based analysis, though remember that you are being asked to do a multiple regression and not a t-test, so the ExampleQuestionAndAnswer will have plenty of differences from what you will need to do.
Question 1
The datafile Attendance.xls contains information on 34 Australian Football League (AFL) matches played in the state of Victoria. The variables you have are:
Attend Attendance at the match in 1000's.
Temp Temperature – the forecast maximum for the day ofthe match, in °C.
Other Attendance at other matches in 1000's – the sum of attendances at other AFL matches in Victoria on the same day as the match in question.
Members Membership – the sum of the memberships (season tickets) ofthe two
clubs whose teams were playing the match in question in 1000's.
Top50 The number of top 50 players (best 50 players in the AFL) playing in the
match in question.
a) Investigate the factors that affect attendance at these AFL matches. Specifically, carry out a multiple regression analysis with Attend as the response and Other, Temp, Members and Top50 as (potential) predictors.
[24 marks] [Note: You can use Best Subsets Regression or Stepwise Regression if you wish to help decide which predictors to include (see the Multiple and Logistic Regression lecture notes and answers to workbook further question 4 for more details). Minitab or SPSS can be used to do this.
HINTS: Ensure that you –
• produce suitable graphs and summary statistics to aid your subjective impression; [4]
• show how you decided which of the potential predictors should be included in the model;
• check assumptions underlying the regression in your final model;
[5]
[7]
• give and interpret the results from your model. Also calculate and interpret 95% confidence interval(s) for the coefficient(s) of the variable(s) in your model
(use a calculator [even though you can get these from SPSS and Minitab]). [8]
NB: the t-distribution critical value you will need for your confidence interval(s) will be depend on the number of variables in your final model. It will be one of:
For the t-distribution with 32 degrees of freedom = 2.037;
For the t-distribution with 31 degrees of freedom = 2.040;
For the t-distribution with 30 degrees of freedom = 2.042;
For the t-distribution with 29 degrees of freedom = 2.045.
Choose the appropriate one for your final model. ]
b) Some of the matches were televised live, with the television company showing the games deciding the matches to broadcast before the start ofthe season.
(i) If we wished to see if we could model the chance of this happening using Members, what method ofanalysis should be used? [1 mark]
(ii) Why would it not make sense to see if Attend, Temp and Other
affected the chance of a game being broadcast? [1 mark]
Question 2
In a case-control study looking at risk factors for liver cancer 185 people with liver cancer and 197 controls were asked about their smoking habits. 61 ofthe cases and 39 of the controls were smokers or had previously smoked.
a) Calculate an odds ratio and its 95% confidence interval (note: the Critical Value of the normal distribution that you will need is 1.96) and then interpret them.
[9 marks]
[HINT: Use a calculator to find the OR and its 95% CI. Do not forget to appropriately interpret these results as well, including assessing the clinical relevance, as at most 5 marks are available for the calculations.]
2022-03-19