Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Reproducible Writing Exercise

STA302H1F 2022. Section: LEC5101

The problem

In 2004, the state of North Carolina released to the public a large data set containing information on births recorded in this state. This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. This is a random sample

of 1,000 cases from this data set.

Now a state official from North Carolina wants investigate the whether the low birth weight is associated with sveral demeographic characteristics of the mother. The official is interested in,

• mage: Mothers age in years

•  fage:  Fathers age in years

•  mature:  Maturity status of mother

•  gender:  Gender of the baby

•  habit:  Smoking Habits

•  marital:  Marital status at birth

•  weight: Weight of the baby

• visits: Number of hospital visits during pregnancy

The dataset can be found in the R package Openintro. You can use the following codes to install the package, load the package, load the dataset and View the dataset in RStudio.

install .packages( 'openintro ')    ##  Install  the package .  Only needs  to  be  run  once library(openintro)  ## Load  the package

data(ncbirths)  ## Load  the  data

View (ncbirths)  ##  View  the  data

For the analysis the state official wants you to perform the following tasks.

•  Create an unique identifier for all the cases

•  Create a new variable by dividing mothers age in to age groups, such as“≤ 15”, “15-20”, “20-25”, “25-30”,“30 − 35”“35-40” and“> 40”

•  Create a new variable by dividing fathers age in to age groups, such as“≤ 15”, “15-20”, “20-25”, “25-30”,“30 − 35”“35-40” and“> 40”

•  Create a variable representing whether a baby has weight ≤ 5 pounds. Assign the lables as “low” and “high” birth weights.

•  Only include mothers with age > 15 years.

• Identify and remove any missing data. However, also make sure that you are not receiving too many observations. While deleting missing observations you realize that there are too many missing data for one of the variables, then you may want to delete that variable.  However, mantion that in the instruction.

•  Delete the variables fage,  mage,  weight,  lowbirthweight,  weeks,  premie.

•  Make sure all the observations have appropriate values for each of the variables.

The state official has asked for the dataset to include no additional information than what they require for their analysis (i.e. should require no further cleaning or modifications).

They also ask you to write out step-by-step instructions for how you cleaned the dataset in the event that they need to clean an updated dataset. You will need to provide step-by-step details as well as justifications for why each step is being completed as it is (i.e. why you are doing it in a particular way, at a particular step in the cleaning process, etc.). You will submit both the cleaned dataset and the dataset provided to you by the client, along with your instructions (up to 500 words).

Note: your instructions must not include any R code, including references to specific functions or using R function names as if they will be understood by the client. The marking scheme is in the following page.