Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Sample Exam Questions A 2018

MATH1015

Biostatistics

1. In a medical trial, the placebo effect occurs when (a)   the subject is scared of the trial

(b)   the subject responds to the idea of the treatment

(c)   the subject is already sick

(d)   the subject knows the investigator   (e)   there is a historical treatment group

2. In a study on smoking and lung cancer, a possible confounder is NOT

(a)   a gene for smoking      (b)   brand of cigarettes      (c)   diet

(d)   alcohol                         (e)   exercise

3. Simpson’s Paradox occurs when

(a)   relationships between percentages in subgroups are reversed when the subgroups are combined.

(b)   relationships between percentages in subgroups are the same when the subgroups are combined.

(c)   relationships between percentages in subgroups are the same when the subgroups are separated.

(d)   relationships between percentages in subgroups disappear when the subgroups are separated.

(e)   a clear trend in individual groups of data is revealed when the groups are pooled together.

4. Suppose a set of bivariate data has a correlation coefficient of 0.90. Which statement is true?

(a)   90% of the points are highly correlated.

(b)   90% of the points fall on a line.

(c)   The linear regression line has a slope of 0.9.

(d)   90% of the points can be predicted by a linear regression line. (e)   The data may have a strong linear trend.

5. In a dataset of size 6, the mean is 7 and standard deviation is 4.  We add 3 to each observation in the data. The new mean and standard deviation are respectively

(a)   7 and 4      (b)   10 and 4     (c)   10 and 7    (d)   7 and 7       (e)   10 and 13


6. A box has a mean 7 and standard deviation 2. Which distribution can best approximate

the distribution for the mean of 100 draws?

(a)   Normal with mean 7 and standard deviation 2.

(b)   Normal with mean 7 and standard deviation 0.2.

(c)   Normal with mean 700 and standard deviation 20.

(d)   T with degrees of freedom 99, mean 7 and standard deviation 2.

(e)   T with degrees of freedom 100, mean 700 and standard deviation 20.

7. For a quantitative data set, the mean and median are the same.  Which statement is true?

(a)   The histogram is skewed                 (b)   The scatterplot shows a linear trend

(c)   The barplot is balanced                   (d)   The boxplot is symmetric

(e)   not enough information

8. For a quantitative data set, the interquartile range (iqr) is 1. Which statement is true?

(a)   The data is skewed.                         (b)   There is no outlier.

(c)   The standard deviation is 1.            (d)   The boxplot has length 1. (e)   The box in the boxplot has length 1.

9. If Z ~ N (0, 1) then P (-3 < Z < 3) is closest to

(a)   0.25           (b)   0.5             (c)   0.68           (d)   0.95           (e)   0.997

10. Which is the output for the following R command?

pnorm(0)

(a)   0               (b)   0.5             (c)   0.7             (d)   0.9             (e)   1

11. Using the linear regression line, what expression would predict y when x = 2.

##

##  Call:

##  lm(formula  =  y  ~  x)

##

##  Coefficients:

##  (Intercept)                       x

##           1 .8403             0 .8655

(a)   1.8403 - 0.8655 × 2    (b)   1.8403 + 0.8655 × 2     (c)   0.8655 + 1.8403 × 2

(d)   0.8655 - 1.8403 × 2     (e)   Not enough information

12. A box contains the numbers 0,2,3,4,6 and 25 draws are made with replacement.  The expected sum of draws and the standard error are respectively

library(multicon)

box=c(0 ,2 ,3 ,4 ,6)

mean(box)

##  [1]  3

popsd(box)

##  [1]  2

(a)   3/25 and 2/^25         (b)   3 × ^25 and 2/^25    (c)   3 × ^25 and 2 × ^25

(d)   3 × 25 and 2 × ^25    (e)   3 × 25 and 2 × 25

13. A box contains the numbers 0,2,3,4,6 and 100 draws are made with replacement. Which R codes calculate the probability that the mean of draws lies within 2.9 to 3.1?

(a)   pnorm((3 .1-3)/(2*10))-pnorm((2 .9-3)/(2*10))

(b)   pnorm((3 .1-3)/(2/100))-pnorm((2 .9-3)/(2/100))

(c)   pnorm((3 .1-3)/(2/10))-pnorm((2 .9-3)/(2/10))

(d)   pnorm((2 .9-3)/(2/10))-pnorm((3 .1-3)/(2/10))

(e)   pt((3 .1-3)/(2/100),100)-pt((2 .9-3)/(2/100),100)

14. A box contains the numbers 0,2,3,4,6, each repeated 200 times and 20 draws are made without replacement, which of the following statements about standard error for the sum of draws is TRUE?

(a)   It becomes one fourth if the sample size is half.

(b)   It drops to 0 if the sample size is 200.

(c)   It remains unchanged compared to the with replacement case.

(d)   It enlarges by a factor of compared to the with replacement case.

(e)   It shrinks by a factor of compared to the with replacement case.

15. A box contains nine“0”and one“1”and 16 draws are made with replacement. Which distribution will best approximate the distribution for the average of draws?

box=c(0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,1)

library(multicon)

mean(box)

popsd(box)

##  [1]  0 .3

(a)   Normal with mean 0.025 and standard deviation 0.3/4.

(b)   Normal with mean 0.1 and standard deviation 0.3/16.

(c)   T with degrees of freedom 15, mean 1.6 and standard deviation 0.3*4.

(d)   Unknown skew distribution with mean 0.1 and standard deviation 0.3/4. (e)   Unknown skew distribution with mean 0.4 and standard 0.3*4

16. A box contains nine“0”and one“1”and 16 draws are made with replacement. Using the simulation result below, what is the probability that the sum of draws is at least 6?

box=c(0,0,0,0,0,0,0,0,0,1)

sim=replicate(10000,sum(sample(box,16,rep=T)))

table(sim)

##  sim

##       0        1       2       3       4       5       6       7

##  1840  3382  2667  1416    533    134     26       2

(a)   0.0002       (b)   0.0026        (c)   0.0028       (d)   0.1840        (e)   0.9998

17. A marketing company is surveying consumers’ preference for Coke over Pepsi.  If p = P (Customer prefers Coke to Pepsi), the null and alternative hypotheses are respectively

(a)   H0  : p = 0.5 and H1  : p 0.5           (b)   H0  : p = 0.5 and H1  : p > 0.5

(c)   H0  : p 0.5 and H1  : p = 0.5           (d)   H0  : p = 0.5 and H1  : p < 0.5

(e)   H0  : p < 0.5 and H1 : p > 0.5

18. A marketing company is surveying consumers’ preference for Coke over Pepsi.  Which box could NOT model the null hypothesis?

(a)   0,1             (b)   0,1,1          (c)   0,0,1,1       (d)   0,0,0,1,1,1  (e)   0,0,0,0,1,1,1,1

19. A marketing company is surveying consumers’preference for Coke over Pepsi.  Out of

100 consumers surveyed, 60 prefer Coke to Pepsi. Which formula gives the test statistic?

box=c(0 ,1)

mean(box)

popsd(box)

##  [1]  0 .5

(a)

60 - 50

10

(b)

50 - 0.6

10

(c)

0.6 - 0.5

0.05

(d)

0.6 - 0.5

0.5

(e)

0.5 - 0.6

5

20. A test was conducted to test the hypotheses H0  : µA  = µB  vs H1  : µA µB  where µA and µB  represent the population mean for group A and B respectively.

##

##   Welch  Two  Sample  t-test

##

##  data:   yield by  variety

##  t  =  -4 .9994,  df  =  19 .441, p-value  =  7 .458e-05

##  alternative hypothesis:  true  difference  in means  is not  equal  to  0 ##  95 percent  confidence  interval:

##    -0 .9293569  -0 .3814274

##  sample  estimates:

## mean  in  group  A mean  in  group  B

##               4 .052941               4 .708333

Based on the results above, which of the following statements is FALSE?

(a)   The test statistic is -4.9994.

(b)   The p-value is close to zero.

(c)   Equality of variance assumption is made.

(d)   The data are against H0 .

(e)   The 95% confidence interval of µA - µB  excludes 0.


1. We are interested in monitoring the air quality index (AQI) in the month of July 2015 between two regions:  Sydney’s central-east (CE) and Sydney’s north-west (NW). Due to the fact that data readings from different pollutants have different underlying units of measure, the AQI is a derived value based on multiple data readings that enables easier comparison across regions and time.   In general an AQI score above  100 and below 150 indicates a poor’ air quality level and that people in the sensitive group (e.g. people with asthma, older adults and children) should consider either cutting back or rescheduling strenuous outdoor activities.  The general public are usually not affected by the air quality within this AQI range.

Source: http://www .environment .nsw .gov .au/AQMS/search .htm

head(data)

CEAQI   NWAQI

[1,]  99    92

[2,]  32   44

[3,]  70    82

[4,]  74    96

[5,]  95  100

.[2(.)9(.),]  41  59

[30,]  48  57

[31,]  34  58

CE=data$CEAQI

NW=data$NWAQI

summary(CE)

##     Min .  1st  Qu .   Median       Mean  3rd  Qu .       Max .

##    30 .00     35 .50     41 .00     50 .77     60 .50    108 .00

summary(NW)

##     Min .  1st  Qu .   Median       Mean  3rd  Qu .       Max .

##    33 .00     38 .50     54 .00     56 .52     67 .00    100 .00

n  = nrow(data)

table(CE  >=  100)

##  FALSE   TRUE

##     30          1

table(NW  >=  100)

##  FALSE   TRUE

##     30          1


100



40                   60                   80                  100

CE

Look over the R output and then answer the following questions.

(a)   Begin by examining the air quality index (AQI) of Sydney’s central-east (CE) region.

(i)   What is the mean AQI score for Sydney’s CE region during July?

(ii)   Comment on the shape of the boxplot for Sydney’s CE region during July.

(iii)   Would it be better to report the mean or median as the measure of centre for

Sydney’s CE region? Explain.

(iv)   How many days in July was the air quality considered‘poor’in Sydney’s CE region?

(b)   The scatter plot graphically shows the air quality index (AQI) relationship between

the two Sydney regions in the month of July.

(i)   From the scatter plot, suggest a value for the correlation coefficient between the two Sydney regions. Explain what it represents.

(ii)

Suppose on the 28th of July, the air quality monitoring instrument was not working in Sydneys NW region, but the AQI value in Sydneys CE region was recorded as 40. Using the R output below, give an expression without evaluation to estimate the AQI value in Sydneys NW region.

L =  lm(NW  ~  CE)

round(L$coeff,3)

## (Intercept) CE

##           19 .887             0 .714

(iii)

Assuming a normal distribution, if the percentile of AQI in CE is 50% (me- dian) for a certain day, what would you expect the percentile of AQI in NW?

For another day, the percentile AQI in CE is 90%, which expression, a, b or c, will give the expected AQI in NW? Explain briey.

a=qnorm(0 .9)*cor(CE,NW)

b=pnorm(qnorm(0 .9)*cor(CE,NW))

c=mean(NW)+qnorm