闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data Analysis Skills: Practice Class Test Marking Scheme

Task 1. Report on Data Analysis

● Appropriate Title and Student Number 1 MARK

Please Note: the code chunks and the mathematical LaTeX code ($ and $$) have been included in Task 1 to show you how the output included in the report was generated. In the ﬁnal .pdf ﬁle the code chunks and the code betwen $$ SHOULD NOT BE SHOWN for Task 1 (but should be shown for Task 2).

library(tidyverse)

library(moderndive)

library(skimr)

library (kableExtra)

library(gridExtra)

cats <- read.csv("cats.csv")

Introduction

● Introduction to the data being analysed and to the question of interest. Marks deducted for copying the data description as given. 2 MARKS

Exploratory Data Analysis

● Summary statistics on heart weight by sex with appropriate comments. One mark removed if the output is simply‘copy-pasted’ from R.

cats %>%

group_by (Sex) %>%

summarise(n=n (),Mean=round(mean (Hwt),digits= 1), St.Dev=round(sd(Hwt),digits= 1), Min=min (Hwt), Q1 = quantile (Hwt,0.25), Median=median (Hwt),

Q3 = quantile (Hwt,0.75), Max=max(Hwt)) %>%

kable(caption = !\\label{tab:summaries} Summary statistics on

heart weight by sex of 144 adult cats. !) %>%

kable_styling(latex_options = "hold_position")

Table 1: Summary statistics on heart weight by sex of 144 adult cats.

Sex	n	Mean	St.Dev	Min	Q1	Median	Q3	Max
F	47	9.2	1.4	6.3	8.35	9.1	10.1	13.0
M	97	11.3	2.5	6.5	9.40	11.4	12.8	20.5

Alternatively, the summary table could be produced using skimr package:

my_skim <- skim_with(base = sfl(n = length))

cats %>%

group_by (Sex) %>%

select (Hwt, Sex) %>%

my_skim() %>%

transmute(Variable=skim_variable, Sex=Sex, n=n, Mean=numeric.mean, SD=numeric.sd ,

Min=numeric.p0, Q1=numeric.p25, Median=numeric.p50, Q3=numeric.p75,

Max=numeric.p100, IQR = numeric.p75-numeric.p50) %>%

kable(caption = !\\label{tab:summary} Summary statistics on heart weight by sex. (produced using ski

booktabs = TRUE , linesep = "" , digits = 2) %>%

kable_styling(font_size = 10 , latex_options = "hold_position")

Table 2: Summary statistics on heart weight by sex. (produced using skimr package).

Variable Sex n Mean SD Min Q1 Median Q3 Max IQR

Hwt F 47 9.20 1.36 6.3 8.35 9.1 10.1 13.0 1.0

Hwt M 97 11.32 2.54 6.5 9.40 11.4 12.8 20.5 1.4

2 MARKS

● Comments on the summary statistics related to the question of interest. 1 MARK

● Boxplot of heart weight by sex. One mark removed if the plot is not appropriately labelled, and axis labels not adjusted accordingly.

~~~{r boxplot, out.width = !68% ! , fig.align = "center",

fig.cap = "\\label{fig:box} Heart weight by Sex.", fig.pos = !H!}

ggplot(cats, aes(x = Sex, y = Hwt)) +

geom_boxplot() +

labs(x = "Sex", y = "Heart weight (grams)",

title = "Heart weights of 144 adult cats")

~~~

Heart weights of 144 adult cats

Sex

Figure 1: Heart weight by Sex.

3 MARKS

● Comments on the boxplot related to the question of interest. 2 MARKS

Formal Data Analysis

● State the linear regression model being ﬁtted, i.e.

H一wt = + Male · Ⅱ Male(z)

$$\widehat{\mbox{Hwt}} = \widehat{\alpha} +

\widehat{\beta}_{\mbox{Male}} \cdot \mathbb{I}_{\mbox{Male}}(x) $$

where

● the intercept $\widehat{\alpha}$ is the mean heart weight for the baseline category of Females;

● Male $\widehat{\beta}_{\mbox{Male}}$ is the diﬀerence in the mean heart weight of a Males relative to the baseline category Females; and

● Ⅱ Male(z) $\mathbb{I}_{\mbox{Male}}(x)$ is an indicator function such that Ⅱ Male(z) = y0(1) O(if)ther(Sex)w(o)

$$\mathbb{I}_{\mbox{Male}}(x)=\left\{

\begin{array}{ll}

1 ~~~ \mbox{if Sex of} ~ x \mbox{th observation is Male},\\

0 ~~~ \mbox{Otherwise}.\\

\end{array}

\right.$$

2 MARKS

● Report the estimated model coeﬀecients. One mark removed if the regression output is simply ‘copy- pasted’ from R.

model <- lm (Hwt ~ Sex, data = cats)

table_values <- get_regression_table(model)

table_values %>%

dplyr ::select(term,estimate, lower_ci, upper_ci, p_value) %>%

#Note that it seems necessary to include dplyr:: here!!

kable(caption = !\\label{tab:reg} Estimates of the parameters from the fitted linear regression model. ! ,

col.names = c ( "Term" , "Estimate" , "CI Lower Bound" , "CI Upper Bound" , "p value"),

align=rep ( !c ! , 5)) %>%

kable_styling(latex_options = !HOLD_position!, )

Table 3: Estimates of the parameters from the ﬁtted linear regression model.

Term	Estimate	CI Lower Bound	CI Upper Bound	p value
intercept	9.202	8.560	9.845	0
Sex: M	2.121	1.338	2.904	0

4 MARKS

● Appropriate comments on the regression coeﬃcients and the diﬀerence between males and females. 4 MARKS

NB: THE DIAGNOSTICS IN THE REMAINDER OF THIS ANALYSIS SECTION SHOULD NOT BE INCLUDED IN THE CLASS TEST SINCE THESE PLOTS (MOSTLY) SUPPORT THE ASSUMPTIONS OF THE FITTED MODEL

● Plots for checking model assumptions.

~~~{r residplots, echo=FALSE, fig.width = 13, fig.align = "center",

fig.cap = "\\label{fig:resids} Scatterplots of the residuals by Sex (left) and a histogram of the residuals (right).", fig.pos = !H!, message = FALSE} regression.points <- get_regression_points(model)

p1 <- ggplot(regression.points, aes(x = Sex, y = residual)) +

geom_jitter(width = 0.1) +

labs(x = "Sex", y = "Residual") +

geom_hline(yintercept = 0, col = "blue")

p2 <- ggplot(regression.points, aes(x = residual)) +

geom_histogram(color = "white") +

labs(x = "Residual")

grid.arrange(p1, p2, ncol = 2)

~~~