Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Psy148a Final Project Formal Proposal

Introduction:

ADHD has been formally recognized as a disorder in the DSM since the 1960s. The CDC says that boys are twice as likely to be diagnosed with ADHD as compared to girls. This study aims to look at the prevalence of ADHD/ADD diagnoses in children with age and sex. This project is going to be studying the relationship between age, ADHD/ADD diagnoses, and sex amongst children. If there is a significant relationship, then it is indicative of the change in the methods of what is defined as ADHD/ADD “symptoms”/diagnosis.

The study aims to explore these research questions:

Is there a statistically significant gap between ADHD/ADD diagnosis and the biological-based sex of the child?

Is there a relationship between the diagnosis rate for ADHD/ADD for age and sex?

Three hypotheses were proposed:

Hypothesis: It is expected that boys (biologically based males) would have a higher diagnosis rate of ADHD/ADD than in girls (biologically based females) in all three age groups.

There are three constructs or variables in this project: age, sex, and the diagnosis of ADHD/ADD.  The age of children is a categorical variable and is measured based on their physical age. Sex of the children is also a categorical variable that is based on their biological birth sex, which is female or male. The construct “ADHD” will be tested through the variable named “K2Q31B”, which refers to the diagnosis of the subjects.

Data:

The dataset to be used in this study is data from the Data Resource Center for Child and Adolescent Health, and the survey questions are based on The National Survey of Children’s Health (NSCH). NSCH took surveys for information on national children’s health in all 50 states and the District of Columbia. The target population of the data was children 3-17 years old. The data was collected from the year 2021. The National Survey of Children’s Health (NSCH) provides data on multiple demographics of children’s lives, as well as the presence of health conditions. The dataset is not public, it requires a submission of a request to get access to the data download.

The variables to be measured include age, sex, and the diagnosis of ADHD. In the original dataset, “age” is a categorical variable ranging from 0 to 17: children’s age was recorded in the dataset from their birth date information.

“Sex” is also a categorical variable, separated into two groups: “1” for males and “2” for females. Information missing would be recorded as “99”. All of this information was self-reported.

The diagnosis of ADD/ADHD would be measured through the variable named “K2Q31B”.  For question K2Q31B, participants were asked in a survey to choose “yes” or “no” to the question: “Has a doctor or other health care provider told you that this child CURRENTLY has Attention Deficit Disorder or Attention Deficit/Hyperactivity Disorder, that is, ADD or ADHD?”. The participants who answered “yes” were labeled as “1” and participants who answered “no” were labeled as “2”, and information missing would be recorded as “99”. Children ages 0-2, were recorded as “95”. This was based on self-report from the children who filled out the survey.

The downloading link to the data set is http://www.childhealthdata.org/dataset/download?rq=14143

Plan:

In this section, you should present an analytic plan to lay out the steps of doing this project, including:

1. secure the data and read the data into R.

- Send the request for data access

- Look at the folder containing the original data and description materials provided by the organization

- Save the dataset file to the R working directory and rename the file with all of our initials

- Read the data (.csv file) into R using the “read.csv” function

2. data screening, data cleaning, and/or data manipulation.

- Read the 2021 data file into object “data.2021”

- Data screening: went through the description for each variable and selected target variables for our analysis based on our research questions

- Remove all columns except for variables that we need

- Rename variables being used

- Data cleaning: remove any NA or missing data

- Subset all observations with values “95” and “99” (as described in the instructions for the dataset)

- Remove missing data (subsetted in the last step) using the function “rm()”

- Check to see if there are other any missing values using is.na()

3. univariate analyses: description of each variable (e.g. numeric summary and/or graphical presentation).

- Descriptive statistics: since we are looking at ADHD/ADD diagnosis in 2021, we will present some descriptive stats for ADHD/ADD diagnosis: mean/middle/median age of ADHD/ADD diagnosis.

- Graphical presentation: we could use a boxplot (two columns: diagnosed with and without ADHD/ADD)

4. bivariate analyses: numeric summary (e.g. correlation) and graphical presentation (e.g. bi-variate scatter-plot);

- Correlation analysis (see details for each hypothesis in step 5)

- Graphical presentation:

- Either a boxplot or a histogram with an empirical curve? For each age range (separate graphs).  For each age range, mark the highest point of diagnosis.

- Place the histogram in one graph model to compare and contrast the data.

5. for each research question, any statistical model? Note: you may update your analytic plan for statistical modeling later (by Nov. 28);

- Sex x diagnosis: to test this hypothesis, the H0 would be the diagnosis rate of boys (biological birth male) is lower than or equal to the diagnosis rate of girls (biological birth female), meaning the μ(boys diagnosed in ADHD) would be lesser than or equal to μ(girls diagnosed in ADHD). The H1 would be the diagnosis rate of boys (biological birth male) is higher than the diagnosis rate of girls (biological birth female), meaning the μ(boys diagnosed with ADHD) would be larger than μ(girls diagnosed with ADHD). The directional t-test would be used to test the hypothesis.