Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

GPH-GU 2480 Longitudinal Analysis of Public Health Data

Section 002

Spring 2024

COURSE DESCRIPTION:

This course covers modern methods for the analysis of repeated measures, correlated outcomes, and longitudinal data, including the unbalanced and incomplete data that are characteristic of public health research. There are four widely available methods for dealing with dependence: robust standard errors, generalized estimating equations, random effects models and fixed effects models. This course examines each of these methods in some detail, with an eye to discerning their relative advantages and  disadvantages. Different methods are considered for quantitative outcomes and categorical outcomes. The course uses Stata statistical software and gives students hand-on experience working with real data.

PRE-REQUISITES:

GPH-GU 2995 Biostatistics for Public Health

GPH-GU 2353 Regression I: Linear Regression and Modeling

COURSE REQUIREMENTS AND EXPECTATIONS:

1.    Students are expected to attend all class sections. Students are expected to come to class on   time to prevent disrupting the lecture and classroom activities. If you cannot attend a certain   session, it is your responsibility to notify Dr. Cook or the head CA beforehand, or in the case of an emergency, immediately upon your return.

2. Classroom environment: Ideally, everyone should be involved in classroom discussions and

working problems in the breakout rooms. In order for everyone to feel comfortable presenting work and voicing opinions, questions, and suggestions, a climate of tolerance and respect is essential.

3. Complete reading assignments prior to class.  Readings are listed in the course schedule on pages 9 and 10. Additional readings may be assigned as needed.

4. Participation: This class will be taught synchronously for the most part. However, due to the

ongoing, and ever changing, COVID-19 situation, some sessions may be taught asynchronously. Participation for this class will be evaluated based on the completion and timely submission of the class exercises assigned in each lecture. Students will work in groups, during class time or on their own, to complete the class exercises. Attendance will be taken during lab sessions led by the course assistants.

5. Final Project: For the group final project, students will complete a longitudinal data analysis

project from start to finish. This will incorporate each of the lesson’s learned from the

homework assignments with an additional component aimed at selecting a particular

longitudinal model based on the provided data. In addition, all students will rate the relative

contribution of each member of their Research Team at the end of the semester. More details   and the rubric for this project will be provided in class 5. You will also do a final presentation on May 6th  which will count towards your final paper grade. Directions regarding the presentation  will be provided in class 10.

6. Homework: All homework assignments must be uploaded to Gradescope BY the due date at

11:59PM (see course outline below)late homework will not be accepted without instructor’s permission. Homework should be submitted as a knit PDF file if using R, or as a PDF document if using Stata with the code copied and pasted.

a. Assignment 1 (10pts): Explore the ICPSR website

(https://www.icpsr.umich.edu/index.html). In this assignment you will explore one of the most prominent data warehouses and think critically about the types of longitudinal research questions you are interested in exploring further. We will specifically be exploring 2 data sources for this assignment – The Flint Adolescent Study (1994-1997 and 200-2003) and National Longitudinal Study of Adolescent to Adult Health (Add Health), 1994-2008. Directions are below.

i.    First, you will summarize the main point of the study in 250 words or less (see At a glance tab).

ii.   Second, you will describe the sampling design in 250 words or less (see data and documentation). For instance, were in home surveys used? Was the survey administered online? Be sure to note any important limitations in the sampling design that were noted by researchers.

iii.   Third, select ONE publication from the “Data-related Publications” tab that is of interest to you. The publication must utilize longitudinal data. In 500 words or less summarize the main point of the article. In your summary, make sure to identify the hypothesis (or research question), identify the predictor(s) and outcome(s), identify what descriptive statistic were utilized, identify the longitudinal modelling procedure that was conducted, and discuss the findings  of the analysis. Make sure also to note what statistical program the data were analyzed in (if not stated, in your assignment write that this information was not present).

b. Assignment 2 (10 pts): In this assignment you will describe your data. We will provide

you with one dataset. In your submission provide your well organized code and a word document of your answers to the questions below. If tables or figures are required to answer any of the questions below please attached the table(s) or figure (s) underneath the specified question. NOTE: You may need to do some light-moderate cleaning. In addition, equations, figures, and tables do not count towards the word/sentence limit.

i.   Your outcome variable is “CES-D Depression Score - 10 Item Scale” . You will select ONE additional predictor variable. Also, please define your “time” variable (e.g., age, wave, etc.). You should also state how the predictor and outcome are operationalized. (2-3 sentences)

ii.   Write the equation and define the coefficients. Also note the differences between the multi-level model and the composite model.

1.    Level-1 model and Level-2 model

2.    Composite model

iii.    In “wide” form describe your data utilizing the descriptive tools learned in class. (2-3 sentences).

1.    Descriptive statistics

2.    Profile plots

iv.    In “long” form describe your data utilizing the descriptive tools learned in class. (4-5 sentences)

1.    Empirical growth plots

2.    Non-parametric standardization

3.    Parametric standardization

4.    Individual OLS regressions conducted and visualized with the mean trajectory line.

v.   Obtain the descriptive statistics and answer the following questions (350 words or less):

1.    What are the sample means of the estimated intercepts and slopes?

2.    What are the sample variances (i.e. standard deviations) of the estimated intercepts and slopes?

3.    What is the correlation between the estimated intercepts and slopes (i.e. covariance)?

4.    Without having conducted the full multi-level model what are your initial thoughts concerning the intercept and slope of the level one model and the level 2 model?

c. Assignment 3 (10 pts): In this exercise we are learning about conditional growth

models. With some of these models we would like to understand how a level 2 covariate influences changes in trajectory over time. In this class exercise we will walk through how different aspects of race and biological sex can change trajectories of mental health overtime among adolescents. Key to this assignment is understanding the different interpretations of our findings as we utilize different configurations of our covariates. If   tables or figures are required to answer any of the questions below, please attached the table(s) or figure (s) underneath the specified question. NOTE: You may need to do some light-moderate cleaning. In addition, equations, figures, and tables do not count towards the word/sentence limit. Data

DATA: The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of over 20,000 adolescents who were in grades 7-12 during the 1994-95 school year, and have been followed for five waves to date, most recently in 2016-18. Over the years, Add Health has collected rich demographic, social, familial, socioeconomic, behavioral, psychosocial, cognitive, and health survey data from participants and their parents; a vast array of contextual data    from participants’ schools, neighborhoods, and geographies of residence; and in-home   physical and biological data from participants, including genetic markers, blood-based    assays, anthropometric measures, and medications. Ancillary studies have added even   more data over the years. Data from the project are available in various forms and have been analyzed in thousands of publications in peer-reviewed journals.” https://addhealth.cpc.unc.edu/

Your data consists of the first 5 waves of data. Your dataset has id, time, race (see 5 categories), biological sex, and a race*biological sex variable, and the ces-d (depression scale). The data is already in long form. You hypothesize that there are differences between racial and biological sex groups on depression trajectories. We will explore where and how these differences present (in real research you would have more precise hypotheses).

1.    Write the level 1 equation

2.    Write the level 2 equation

3.    Write the composite equation

4.    Run the unconditional mean and unconditional models predicting trajectories of depression

a.    Interpret

5.    Run 3 separate conditional models (one with race, one with biological sex, one with the interaction term).

a.    Interpret

b.   What are the differences between the interpretations of the

slope between each model? (i.e., do you notice that if you do    not look at the interaction between race and biological sex you miss its importance?).

c.    Discuss how statisticians must be diligent about making sure

that populations are deemed invisible based on the statistics we conduct.

d. Assignment 4 (10 pts): In this assignment you will conduct a selected random coefficient model to evaluate trajectories of change using the outcome you selected in assignment   #2. You will need to select at least one additional time-varying covariate to include in your model. In this assignment you will provide your annotated code as well as a word document with a write-up of your process and the results.

i.       Your outcome variable is “CES-D Depression Score - 10 Item Scale”. You will select ONE additional predictor variable (in addition to the covariate you selected in assignment #2). You should also state how the predictors are operationalized. (2-3 sentences).

ii.       Conduct the unconditional mean model

1.    Interpret the fixed and random effects

2.    Conduct the ICC and interpret

iii.       Conduct the unconditional growth model

3.    Interpret the fixed and random effects

4.    Graph the unconditional growth model

iv.       Conduct a growth model with the main IV only (IV selected in assignment 2)

5.    Interpret the fixed and random effects

6.    Graph the growth model

v.       Conduct a growth model with the main IV and at least one additional time- varying covariate

7.    Interpret the fixed and random effects

8.    Graph the growth model

vi.       Using the fit statistics learned in class (i.e. Likelihood, Deviance and AIC/BIC) assess the model fit between the 4 models conducted. Which is the best model and why?

b. Assignment 5 (10 pts): This assignment will focus on non-linear change models. Based on  the data provided, you will select the best model for the data and report your findings. In  this assignment you will provide your annotated code as well as a word document with a  write-up of your process and the results. You will utilize the “Assignment4 dataset.dta” STATA dataset. This is a modified AddHealth data source. Please use a significance level  of .10 for this class exercise (i.e. report associations meeting the p<.10 threshold instead of the usual p<.05). You are a researcher that would like to understand how differences in a mother’s marital status of 7-12th graders influence trajectories of self-reported general health. For this exercise you will conduct random effect and population average models and report significant coefficients in each model. In addition, for some models you will also be asked to report and discuss the predicted probabilities at certain levels of the covariates in the models. Lastly, you will describe differences between the models and then select the “best” model. Your outcome variable is “self-rated general health” . Your main predictor variable is “mother marital status W1.” You will utilize the time variable labeled “time.” This variable corresponds to measurement occasion. The time variable corresponds to an average sample age of 15 at baseline (7-12 grade), age 16 at the first follow-up, and (about 6 years later) age 21 at the second follow up period. You will control for age, biological sex, and level of education of the mother at baseline. Additional details concerning the assignment can be found below.

i.   Conduct a population average logistic regression model

1.    Interpret the overall model and the significant coefficients

ii.   Conduct a random intercept logistic regression model

1.    Interpret the overall model and the significant coefficients

2.    Interpret the intraclass correlation

3.    Describe key differences between this model and the population average logit model performed in number 1

4.    Sex differences in predicted probabilities for those with mothers with less than a high school education

a.    Graph the predicted probabilities of the effect of mothers’

marriage status at baseline on trajectories of general health for 15-year-old female participants who have a mother with less

than a higher school education. Interpret the graph.

b.    Graph the predicted probabilities of the effect of mothers’

marriage status at baseline on trajectories of general health for 15-year-old male participants who have a mother with less

than a higher school education. Interpret the graph.

c.    What are the notable differences between the two graphs?

5.    Sex differences in predicted probabilities for those with mothers with a college degree or higher

a.    Graph the predicted probabilities of the effect of mothers’

marriage status at baseline on trajectories of general health for 15-year-old female participants who have a mother with a

college degree or higher (i.e. post graduate degree). Interpret the graph.

b.    Graph the predicted probabilities of the effect of mothers’

marriage status at baseline on trajectories of general health for 15-year-old male participants who have a mother with a college degree or higher (i.e. post graduate degree). Interpret the graph.

c.    What are the notable differences between the two graphs?

ii.   Summarize your overall thoughts concerning the main research question. In

your response also note any notable findings when examining the predicted probabilities.

NOTE: All homework assignments must be typed (print - 1” margins, Times New Roman 12pt or   Arial 11pt font, Stata output – Courier New 9pt font) – no smaller and no larger – no exceptions!).

Stata code should be copied and pasted into the document. R markdown assignments must be knit. Assignments must be submitted as a PDF.  Calculations may be neatly handwritten (Calculations using the “Equation” function in Microsoft Word is preferred). Your NAME AND SECTION NUMBER must be on the top of each page that you hand in. If I cannot read your answers, I cannot grade your work.

NOTE: If you have questions about grades on any assignment or exam, speak to Dr. Cook within 3 days of receiving said grade.  After this timeframe, she will not entertain grading disputes.

GRADING RUBRIC:

Item:

Percentage

Homework (5)

50

Final Project

30

Peer Evaluation (Final Project)

10

Class Participation

10

GRADING SCALE:

A:

94-100

A-:

90-93

B+:

87-89

B:

83-86

B-:

80-82

C+:

77-79

C:

73-76

C-:

70-72

D+:

67-69

D:

60-66

F:

<60