闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 614, Statistical Methods, Fall 2022. Final Exam

Due: 6:00pm, Tuesday, December 13

STAT 614, Statistical Methods, Final Exam Part I.

Descriptions of several studies are given below. For each of Study 1, 2, 3, 4, identify the following:

(a) The research objectives of the study.

(b) The individuals (aka. cases, samples, subjects, observations, etc) of the data set. What is the population that this study can be used to draw inference upon?

(d) Which variables are quantitative (i.e., numerical)? Which are qualitative (i.e., cate- gorical)?

(e) When applicable, clearly state the response variable and the predictors variable (i.e., the explanatory/indepdent variable).

(f) The statistical method and the initial theoretical model needed to address the re- search questions of interest.

(g) Brieﬂy explain how to use the above model to address the research questions of interest.

(h) The assumptions of the model.

You do not need to conduct any data analysis in Part I.

Example: Amphetamine eﬀect on rats. Researchers were interested in the eﬀects of an amphetamine on the behavior of rats. Before the study began, 96 “thirsty” rats were trained to press a lever to obtain water. Each rat received one of three doses of the drug (i.e. the amphetamine) or placebo, (DRUG) randomly. One hour after the drug injection, an experimental session began in which the rat received water after pressing the lever a pre-speciﬁed number of times. Half of the rats in each dose group received water after two presses of the lever while the other half received water after ﬁve presses (PRS = number of presses). The researchers recorded the lever press rate, LPR (the total number of lever presses divided by the elapsed time in seconds), used by a rat to press the lever and receive water.

The primary research question was whether the drug aﬀected the lever press rate LPR. Also of interest was whether the number of presses required to obtain water (PRS) has an impact on LPR, and whether it impacts how drug dose aﬀects LPR. The data are in the ﬁle rat2 .csv:

❼ RAT: ID of the rats.

❼ DRUG: Drug does. 1 = placebo, 2 = low, 3 = median, 4 = high.

❼ PRE: Number of presses required to obtain water. 1 = 2 presses, 2 = 5 presses.

❼ LPR: Lever press rate, a positive continues measure.

Answers:

(a) The objectives of the study are: “The primary research question was whether the drug aﬀected the lever press rate LPR. Also of interest was whether the number of presses required to obtain water (PRS) has an impact on LPR, and whether it impacts how drug dose aﬀects LPR.”

(b) The individuals in this study the 96 rats. The study can be used to make inference on the population is all rats.

(d) Drug is categorical (ordinal). Prs is numerical, but is treated as categorical. LPR is numerical.

(e) LPR is the response variable, since it is the outcome of Drug and Pre. Drug and Pre are the explanatory variables.

(f) Multiple linear regression (or two-way ANOVA) will be appropriate for the analysis. Interaction will be needed to address whether PRS “impacts how drug dose aﬀects LPR.”

❼ Let X1 , X2 , X3 be dummy variables (indicators) for Drug.

— X1 = 1 if Drug=2, and 0 otherwise.

— X2 = 1 if Drug=3, and 0 otherwise

— X3 = 1 if Drug=4, and 0 otherwise.

❼ Let X4 be the dummy variable for Pre. X4 = 1 if Pre=2, and 0 otherwise.

❼ Consider a regression model:

LPR = β0 +β1X1 +β2X2 +β3X3 +β4X4 +β5 (X1X4 )+β6 (X2X4 )+β7 (X3X4 )+ ε

(g) We can test the signiﬁcance of the interactions (H0 : β5 = β6 = β7 = 0) to ad- dress whether PRS “impacts how drug dose aﬀects LPR.” The eﬀect of Amphetamine on LPR can be evaluated by assessing the signiﬁcance of DRUG. We may also ﬁt a non-interaction model and construct the conﬁdence intervals for the slopes of DRUG indicators.

(h) The assumptions of the model are:

❼ ε N (0, σ2 ). (Normal, independent, constant variance on the error terms.)

❼ The explanatory variables and the response variable are associated in a linear function.

❼ Computational issues (optional): no extreme outliers.

Study 1. Trauma recovery. Researchers at a trauma center wished to develop a program

to help brain-damaged trauma victims regain an acceptable level of indepen- dence. The objective of their study was to compare diﬀerent types of psychiatric treatment. Each subject was assigned to one of 4 diﬀerent psychiatric treatment groups. There were eighteen subjects in each group. The researchers measured the number of months elapsing between initiation of therapy and time at which the pa- tient was able to function independently. The data are in the ﬁle TraumaRec .csv.

Study 2. CO2 levels over time. Scientists at a research station in Brotjacklriegel, Ger-

many recorded CO2 levels, in parts per million, in the atmosphere. CO2 levels for each day from the start of April (day = 91 is April 1st, day = 92 is April 2nd, etc.) through November of a given year were collected with the goal of charac- terizing how CO2 levels change over time. Data are in the ﬁle CO2 .csv. Make a scatterplot of CO2 levels versus day. Find and evaluate a model that captures the main trend in this scatterplot. Include the estimated trend in your scatterplot. At what day does your ﬁtted model estimate the minimum (mean) CO2 level? What month and day does this correspond to? Give a 95% conﬁdence interval for the population mean CO2 level on this day.

Study 3. Lead exposure. Data in lead3vNoNA .csv is collected from a group of children

who lived near a lead smelter in El Paso, Texas. Researchers are interested in the eﬀects of exposure to lead on the neurological well-being of children. The response variable measured is the number of ﬁnger-wrist taps in the dominant hand in a 10 second trial, MAXFT. MAXFT is used as a measure of neurological function in the children but one issue with such data in children is that it is often strongly related to age. Even slight age diﬀerences between the exposed and control groups could explain diﬀerences between the groups in neurological function. Thus we would like to adjust for age in a model that examines the association between exposure and neurological function. It is also of interest whether the eﬀect of exposure varies with age.

The variables in the data set are:

❼ age : age in years

❼ GROUP : exposure group, where 1 = control, 2 = currently exposed and 3 = previously exposed

❼ MAXFT : number of ﬁnger-wrist taps in the dominant hand in a 10 second trial.

Study 4. Air pollution and mortality rate. Data relating air pollution to mortality

rates for various standard metropolitan statistical areas in the United States is given in the data set pollution .txt. The data consists of ten predictor variables, listed below, and one dependent variable - the total age-adjusted mortality rate per 100,000. Researchers are interested in ﬁnding a good predictive model (or set of models) for mortality. (Too many explanatory variables is not always desirable for prediction.)

The data set pollution .txt consists of the following ten predictor variables and one response:

❼ X1 : Mean annual precipitation in inches

❼ X2 : Mean January temperature in degrees F

❼ X3 : Mean July temperature in degrees F

❼ X4 : Population per household

❼ X5 : Median school years completed by those over 25

❼ X6 : Percent of housing units that are sound and with all facilities

❼ X7 : Population per sq. mile in urbanized areas

❼ X8 : Percent non-white population in urbanized areas

❼ X9 : Relative pollution potential of sulphur dioxide

❼ X10 : Annual average of percent relative humidity at 1 pm

❼ Y : Total age-adjusted mortality rate per 100,000.

This is the end of Part I.

Please continue to Part II.

STAT 614, Statistical Methods, Final Exam Part II.

First, run the following R code: (your AU student ID should be a 7-digit number.)

set .seed(your 7-digit AU ID)

sample(c(1:4), 1)

Include your code and the outcome to your Part II answer.

The above code picks which study you will analyze. Conduct a complete analysis to address the questions of interest of the study and write a report.

❼ The main body of the report should be no longer than 3 pages.

❼ Only include relevant output or graphs that support your analysis.

❼ Save your report as YourName Stat614Final2.pdf.

Your report should include, but is not limited to, the following:

(1) An exploratory data analysis.

(2) Set up and ﬁt the initial model needed to address the research questions of interest.

(3) State and assess the assumptions of the model. Provide supporting evidence for each assumption. Be clear about which tool is being used to assess which assumption.

(4) If transformation is needed, implement the transformation. Fit the model again and reassess the assumptions. If transformation is NOT needed, do not transform the data.

(5) Address the speciﬁc questions of interest in the chosen Study. You may need to ﬁt additional models in this process. Clearly state which question you are addressing and give supporting evidence. (You can use the full data set, i.e., do not remove any potential outliers from the data set even if you have identiﬁed them. But you shall incorporate the transformations you recommend.)

(6) A summary of your overall conclusions about the study. Include a brief discussion of the strength and weakness of the data and your analysis when applicable.

Note that:

❼ In Study 2, CO2 levels over time, your should include both day and day2 in the

analysis. Use your ﬁndings from the exploratory analysis or the diagnostics to justify why.

❼ In Study 4, Air pollution and mortality rate, be sure to implement variable

selection technique to build a model with less explanatory variables.

❼ Do not include the output unless you refer to it in your report.

❼ In addition to the report, submit one, and only one, R ﬁle that can replicate your anal-

ysis. Only include the relevant codes that reproduces the results you have in the report. Make sure your ﬁle runs without error. Another R user should be able to replicate your results by only changing the ﬁle path. Name the ﬁle YourName Stat614Final2code.Rmd or .R.

2022-12-12

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言