Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

APH 417: Statistical Computing Using SAS

Project 2

Instructions: Please attach/copy your SAS code at the end of your answer for each question. No Code No Grade! Please also include a cover page that states your student ID for assignment submission. Please be concise and only include what I am asking for (i.e., SAS output and brief explanation of your results). File type should be either WORD or PDF. We will use the dataset “frmgham2.csv” for this project. Please also read the data dictionary carefully. Word limit: 600 words.

Late penalty: 5% of the total marks will be deducted for each day past the due date. Work submitted after 5th day (i.e., 120 hours past the due date) will normally receive a mark 0.

Question 1: Data preparation and summary statistics.

Q1(a) Create a new dummy variable called “male” where male shall equal 1 for a male and equal to 0 for a female. Create a user-defined format and a frequency table for your “male” variable. (5 points)

Q1(b) Create a factor variable called “ag” where ag=1 if a person is younger than 50, =2 if a person is not older than 60 (but not younger than 50), and =3 if a person is older than 60. Create a user-defined format and a frequency table of your “ag” variable. (5 points)

Q1 (c) Extract the median, 25th percentile, 75th percentile, minimum and maximum for the following variables: BMI, GLUCOSE, SYSBP and DIABP. Please also make sure that your summary statistics are reported separately for gender groups. (5 points)

Q1 (d) Calculate the correlation coefficient between CVD and the following variables: BMI, GLUCOSE, SYSBP and DIABP. Also report the p-values associated with those correlation coefficients. (5 points)

Question 2: Hypothesis Testing.

Q2(a) Test whether the sample mean of serum total cholesterol (TOTCHOL) in the dataset is different from the standard body cholesterol level (200 mg/dL). (5 points)

Q2(b) Test whether the distribution of BMI follows normal distribution. (5 points)

Q2(c) Use two-sample T test to test whether there is significant mean difference in BMI between two gender groups. (5 points)

Q2(d) Use Wilcoxon-Mann-Whitney (Wilcoxon rank sum test) to repeat the test in (c) (Note: no exact statement is needed). (5 points)

Question 3: ANOVA.

Q3(a) Conduct a two-way ANOVA for TOTCHOL on male and ag (main effects only). Please also include a multiple comparison of the means based on ag using Fisher’s LSD test. (10 points)

Q3(b) Conduct a two-way ANOVA for TOTCHOL on the variable ag using the nonparametric test, i.e., Kruskal-Wallis test (Note: no exact statement is needed). (10 points)

Question 4: Logistic regression.

To explain the risk of cardiovascular disease, we plan to run logistic regression models to learn potential risk factors associated with CVD (dummy variable indicating cardiovascular disease). Select the following variables from the original dataset: CVD, MALE, AGE, BMI, CURSMOKE, BPMEDS, DIABETES, SYSBP, DIABP and educ, and you should also exclude the missing values of educ. In addition, for the variable educ you need to create 3 dummy variables that correspond to educ using educ = 1 as the reference group. Save the resultant observations and variables as a new dataset. Use this dataset for this question.

Q4(a) Run a logistic regression of CVD on all other variables. Provide the regression output. (5 points)

Q4(b) Run logistic regression of CVD on the same list of variables in (a) but with stepwise selection. Provide the regression output. (5 points)

Q4(c) Compare the models in (a) and (b) in terms of their model fit indices and predictive abilities. (5 points)

Q4(d) Report the odds ratios (along with their 95% confidence intervals) of all the variables for CVD based on the model in (a), which ones are significant? (5 points)

Question 5: Frequency tables and categorical data analysis. Remember to include appropriate titles, formats as well as headings in your tables.

Q5(a) Suppose one is wondering whether the educational backgrounds (i.e., the variable educ) of the study participants are evenly distributed (i.e., the proportions of different educational levels are roughly the same in the observed sample). Create the corresponding table and provide the test result. (5 points)

Q5(b) Create a two-way frequency table for CVD and male. Test whether the two variables are independent and report your results. (5 points)

Q5(c) Create a two-way frequency table for CVD and ag. Test whether the two variables are independent and report your results. (5 points)

Q5(d) Create a three-way table for BMI where classes are defined by male and educ. Report the mean, minimum and maximum of the BMI for each group. (5 points)

[Learning Outcome Assessed: B and C]

Module Learning Outcomes:

A

Manipulate data sets including as entering, reading, writing, and importing Data, preparing data for analysis

B

Produce descriptive statistics with graphics

C

Conduct statistical analysis and produce reports.

Grading Rubrics:

Responses will be marked as follows:

Category 1: Knowledge and understanding

Category 2: Intellectual Skills

Category 3:

Transferable Skills

100%

The best answer that could reasonably be expected from a student at that level of study under the prevailing conditions (i.e. exam or coursework).

90-99%

‘Outstanding’

Total coverage of the task set.

Exceptional demonstration of knowledge and understanding, appropriately grounded in theory and/or research.

Outstanding, and comprehensive, justification and evaluation. Well-argued conclusions.

Extremely clear exposition. Excellently structured and logical answer. Excellent presentation, only the most insignificant errors

80-89%

‘Excellent’

As ‘Outstanding’ but with some minor weaknesses or gaps in knowledge and understanding.

As ‘Outstanding’ but with some minor weaknesses or gaps in justification, evaluation, and/or conclusions.

As ‘Outstanding’ but with some minor weaknesses in structure, logic and/or presentation.

70-79%

‘Very Good’

Total coverage of the task set.

Generally very good demonstration of knowledge and understanding, but with some weaknesses or gaps. Good grounding in theory and/or research.

Generally very good justification and evaluation, and conclusions, but with some weaknesses or gaps.

Generally clear exposition. Satisfactory structure. Very good presentation, largely free of grammatical and other errors.

60-69%

‘Comprehensive’

As ‘Very Good’ but with more and/or more significant gaps in knowledge and understanding, and some significant gaps in grounding in theory and/or research.

As ‘Very Good’ but with more and/or more significant weaknesses in justification, evaluation and/or conclusions.

As ‘Very Good’ but with some weaknesses in exposition and/or structure and a few more grammatical and other errors.

50-59%

‘Competent’

Covers most of the task set. Patchy knowledge and understanding, with limited grounding in theory and/or research.

Patchy, with significant limitations or flaws in the justification, evaluation, and/or conclusions.

Competent exposition and structure. Competent presentation but some significant grammatical and other errors.

40-49%

‘Adequate’

As ‘Competent’ but patchy coverage of the task set and more weaknesses and/or gaps in knowledge and understanding. Just meets the threshold level.

Shows barely adequate ability to justify, evaluate or draw conclusions. Just meets the threshold level.

As ‘Competent’ but with more weaknesses in exposition, structure, presentation and/or errors. Just meets the threshold level.

35-39%

‘Compensatable fail’

Some parts of the set task may have been omitted. Major gaps in knowledge and understanding. Very limited grounding in theory and/or research. Falls just short of the threshold level.

Very limited and/or flawed justification, evaluation, and conclusions. Falls just short of the threshold level.

Somewhat confused and limited exposition. Confused structure. Some weaknesses in presentation and some serious grammatical and other errors. Falls just short of the threshold level.

20-34%

‘Deficient’

As ‘Compensatable Fail’ but with very significant omissions, flaws and/or gaps in knowledge and understanding. Falls substantially below the threshold level.

As ‘Compensatable Fail’ but with more significant limitations and/or flaws in the justification, evaluation and conclusions.

Falls substantially below the threshold level.

As ‘Compensatable Fail’ but with more serious weaknesses in presentation and/or grammar. Falls substantially below the threshold level.

0-19%

‘Extremely weak’

Substantial sections of the task not covered. Knowledge and understanding are extremely limited and/or largely incorrect. No appropriate grounding in theory and/or research.

Or: the answer is substantially irrelevant to the assessment task.

Justification, evaluation and conclusions are extremely weak or omitted.

Or: the answer is substantially irrelevant to the assessment task.

Largely confused exposition and structure. Many serious grammatical and other errors.

Or: the answer is substantially irrelevant to the assessment task.

Category Scores:

/ 100

/ 100

/ 100

Final score:

(Category 1 score + Category 2 score + Category 3 score)  /  3  =