Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECE 608: Quantitative Methods in Biomedical Engineering

Assignment #1

OVERVIEW

The primary aim of this assignment is to give you some practice on various concepts covered during the        lectures. Its secondary aim is to familiarize you with R programming. There are two parts to this assignment:

•    Part 1: Three R programming questions (5 marks each).

•    Part 2: One statistical analysis question (10 marks).

This assignment is graded out of 25 marks. It contributes 5% to your final course grade.

SUBMISSIONS INSTRUCTIONS

Submit your completed response document through the LEARN portal. There is a Dropbox folder for this    assignment, under Submit ➔ Dropbox ➔ Problem Set #1. The deadline for submission is  10pm on May 30, 2022 (Monday).

You are required to submit one R notebook file which contains the codes and your responses to this assignment. Please make sure your code will be able to run without any modifications; dependent libraries should be loaded at the beginning of the R notebook. You should also comment your code to clearly explain each of your steps in case your code does not compile properly.

PART 1. R Programming Questions

(15 marks in total; 5 marks per question)

1.   Central limit theorem states that for a random variable/sample X of size n from any distribution, its mean approaches the normal distribution as n increases. To test this theory, simulation can be conducted by       repeatedly drawing samples from the distribution and we observe the distribution of the mean. Write an R code for the steps below.

(i)  Let sample X be a sample of size n=20 drawn from a Poisson distribution with λ=3. By repeatedly drawing sample X 500 times (i.e. number of realization = 500), show that the sample mean X can be approximated to be normally distributed with Q-Q plot. Also, test the normality of the sampling distribution of mean using the Shapiro-Wilk test with α = 0.05

(ii) Repeat (i) 100 times to find the likelihood of the sampling distribution of the mean is normally distributed

(iii)Repeat (ii) with n = 30 and n = 100 to find the corresponding likelihoods

(iv)Summarize the obtained likelihoods and discuss whether the observation aligns with the central limit theorem.

2.   As discussed in class, the z-score and the standard error of mean can be used to estimate the range of the population mean µ as

S               S

 [x − z       , x + z       ] .

Write a R code to show that, with a z-score = 1.96, the likelihood of the population mean falls within the    interval shown above is approximately 95% by simulating  10000 realizations of a random sample X of size 30 from a normal distribution N(0,1).

3.   To help you better understand significance level (α) and power (1-β) in quantitative methods, write an R code for each of the following tasks:

(i)  Randomly draw two samples X1 and X2 of size 30 from the same normal distribution N(0,1) and    conduct an independent t-test to detect if the difference in means is statistically significant. Repeat this process 1000 times with different realizations of X1 and X2 to obtain a distribution of p-values. From the p-value distribution, determine the likelihood of rejecting the null hypothesis (i.e.

X1  = X2 ) when it is true at α = 0.05?

(ii) Repeat (i) but draw X2 from a normal distribution N(0.73, 1) instead. What is the likelihood of you   failing to reject the null hypothesis when it is false (i.e. committing type II error) at α = 0.05?            Determine the statistical power (i.e. rejecting the null hypothesis when it is false, 1- β) of your test as well.

PART 2. Statistical Analysis

(10 marks in total)

4.   Clara is investigating the plasma bradykininogen level in patients with Hodgkin’s disease. She believes that the plasma bradykininogen level would be lower in patients with Hodgkin’s disease.

To confirm this assertion, she collected blood samples from 22 healthy volunteers and 16 patients with active Hodgkin’s disease.

(i)  What are the research question and hypothesis of Clara’s research based on the PICO framework (without timeframe)?

(ii) Check if the provided data (assignment1-Q4.csv) meets the parametric test assumptions, including data normality, homogeneity of variance and independence of data

(iii) What are the null hypothesis and alternative hypothesis for the independent t-test?

(iv) Conduct an independent two-sample t-test using R with the provided data. What is the conclusion you can draw from the statistical test?

(v) Write an excerpt to report the results of this research in a scientific manner; it should include the assumption checks and the statistical analysis results as discussed in the lecture.