Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECE 608: Quantitative Methods in Biomedical Engineering

Assignment #2

OVERVIEW

The primary aim of this assignment is to give you some practice on various concepts covered during the          lectures. Its secondary aim is to familiarize you with R programming. There are three parts to this assignment:

1. Part 1: One R programming questions (5 marks).

2. Part 2: One statistical concept question (5 marks).

3. Part 3: Four statistical analysis question (5 marks each).

This assignment is graded out of 30 marks. It contributes 10% to your final course grade.

SUBMISSIONS INSTRUCTIONS

Submit your completed response document through the LEARN portal. There is a Dropbox folder for this           assignment, under Submit ➔ Dropbox ➔ Problem Set #2. The deadline for submission is 10pm on July 4, 2022 (Monday).

You are required to submit one R notebook file which contains the codes and your responses for Part 1 and Part 3. Please make sure your code will be able to run without any modifications; dependent libraries should be        loaded at the beginning of the R notebook. You should also comment your code to clearly explain each of your steps in case your code does not compile properly.

For Part 2, you have two options to preparing your response to this assignment:

4.   You may type up your answer and calculations to each question in a Word file

5.   You may use pen and paper to write your responses, and then use a scanner to scan your completed      response sheets for submission. Note: If you are submitting a photo shot of your response sheet (or use the CamScanner app), make sure your response is clear and can be easily read.

PART 1. R Programming Questions

(5 marks in total)

1.   From lecture 3, we know that the family-wise error rate of multiple comparison of an experiment can be computed as

= 1− (1−)j

wherej is the number of comparisons. We want to investigate how conservative this family-wise error rate (FWER) is by comparing the theoretical FWER to a practical one through the simulation below. We will   also compare the FWER to ANOVA by using the same simulation. Write an R code for each of the            following steps:

(i)   Randomly draws six normally distributed samples (X1 to X6) with mean and standard deviation      equal to 0 and 1 respectively using rnorm; each sample has a sample size of 10. Perform three         separate independent t-tests on X1 vs X2, X3 vs X4, X5 vs X6 with alpha = 0.1; the null hypothesis for each t-test is that the means are the same. Check how many type I error (i.e. rejecting null          hypothesis when it is true) you have with the t-tests. Repeat this process for 10000 times (i.e. 10000 realization of six normally distributed samples and the corresponding t-tests), and count how many realization you have at least one type I error.

(ii)  To simulate the multiple comparison operation similar to that of ANOVA, repeat (i) but compare X1

to X2, X1 to X3 and X2 to X3 instead and count how many realization you have at least one type I error.

(iii) Perform an ANOVA test on X1, X2 and X3 for each realization and count how many times you have detected a type I error with ANOVA and alpha = 0.1.

(iv) Compare the counts you have in (i) and (ii) to the theoretical FWER derived using the equation    above and determine which one is the theoretical FWER and which one is the practical FWER.    From this, briefly comments on the conservativeness of the theoretical FWER. Also, comment on the type I error rate of ANOVA compared to the practical FWER with multiple t-test comparison.

PART 2. Statistical Concepts

(5 marks in total)

2.   In dependent t-test, the test statistic is

t = ( D ) / N

.

where D is the pair difference and N is the number of pairs.

Show that the denominator is the standard error of mean of the pair difference D, i.e.,

=

SD

N

where SDD is the standard deviation of pair difference D.

PART 3. Statistical Analysis

(20 marks in total, 5 marks each)

3.  A research group tried to study how income would affect the number of cigarette smokers in different age  group in Waterloo. The researchers asked 1000 people about their age, income and smoking habit in           Waterloo and the percentages of smokers per group are shown below. The age and income were grouped    into three and five categories respectively. In this question, we will focus on the effect of age on smoking   habit. Conduct one-way between-subject ANOVA on age and percentage of smokers using R to answer the following questions. Justify your answer with test statistics and p-values.

% Smokers

38

42

14

41

41

16

36

39

18

32

36

15

28

33

17

Income

1

1

1

2

2

2

3

3

3

4

4

4

5

5

5

Age

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

Adapted from Hoaglin, D., Mosteller, F., and Tukey, J. (1991). Fundamentals of exploratory analysis of variance. Wiley, New Your, page xx.

(i)   Is the main effect on age statistically significant?

(ii)  Does the ANOVA model meet all the assumptions? Check the normality and homogeneity of

variance assumptions.

(iii) Conduct Tukey’s HSD to determine which group(s) is/are different from other groups (assuming the main effect was significant)? Plot the Tukey’s SCI to substantiate your answer.

4.  A study tested whether blood cholesterol was reduced after using margarines of brand A and B as part of a  low fat, low cholesterol diet. The study was conducted on 18 people using margarine to reduce blood           cholesterol over three time points as shown below. Conduct a one-way repeated-measure ANOVA using R on time and cholesterol with all 18 samples (i.e., ignoring margarine) to answer the questions below. Justify your answer with test statistics and p-values.

ID

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

Before

6.42

6.76

6.56

4.80

8.43

7.49

8.05

5.05

5.77

3.91

6.77

6.44

6.17

7.67

7.34

6.85

5.13

5.73

After 4

weeks

5.83

6.20

5.83

4.27

7.71

7.12

7.25

4.63

5.31

3.70

6.15

5.59

5.56

7.11

6.84

6.40

4.52

5.13

After 8

weeks

5.75

6.13

5.71

4.15

7.67

7.05

7.10

4.67

5.33

3.66

5.96

5.64

5.51

6.96

6.82

6.29

4.45

5.17

Margarine

B

A

B

A

B

A

B

A

B

A

B

B

A

A

A

B

A

B

Data contributed by Ellen Marshall, University of Sheffield.

(i)   Is the main effect on time statistically significant?

(ii)  Does the repeated-measure ANOVA model meet all the assumptions? Check the normality and

sphericity assumptions. Adjust the test statistic and p-value in (i) if necessary.

(iii) Conduct multiple comparison t-test with Bonferroni correction to determine which group(s) is/are different from other groups (assuming the main effect was significant)?

5.   Based on the same dataset in Q5, conduct a mixed two-way ANOVA on time, margarine and cholesterol using R to answer the following questions. Justify your answer with test statistics and p-values.

(i)   Are the main effects statistically significant?

(ii)  Does the mixed ANOVA model meet all the assumptions? Check normality, homogeneity of

variance and sphericity assumptions. Adjusts the p-values in (i) if necessary.

(iii) Is the interaction effect between time and margarine statistically significant? Plot the cholesterol versus time and margarine to substantiate your answer.

(iv) Write an excerpt to report the results of this research in a scientific manner; it should include the    assumption checks, statistical analysis results on main effects and post-hoc analysis as discussed in the lecture.

6.   A study analyzed the weight gain of rats when they were treated with three types of diet and two different    amounts. The raw data points are included in the table below. Conduct a between-subject two-way ANOVA using R on diet type and diet amount with the corresponding interaction term to answer the questions below. Justify your answer with test statistics and p-values. α is at 0.1 level for this question.