Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Mathematics and Statistics

EXAMINATION

End-of-year Examinations, 2019

STAT221-19S2 (C)

Introduction to Statistical Computing Using R

Q.1  [7 marks]

We are setting up a pseudorandom number generator in our computer using a “linear congruential”  number generator, which  produces integer  numbers via the following recursive algorithm:

Xn+1  = (aXn + c) mod m                                         (1)

(a) We pick values for the parameters a , c, and m . Starting from a seed X0 , we use the above algorithm to produce a sequence of the rst m numbers Xi . We notice that X3  and X45  have the same value.

Is it a problem? Why? And, if you think it is a problem, can we x it by changing X0 ?

(b) Why do we call an algorithm to generate random numbers, i.e. the one expressed by Equation (1), a pseudo-random number generator?

(c) Why is it a good idea to start Monte Carlo simulations in R by first setting a random seed (i.e., using set .seed(. . . ))?

Q.2  [5 marks]

Consider rolling fair dice, each with the outcome Xi  = 卜1; 2; 3; 4; 5; 6;. The outcome of each die is independent of the others.

(a)  If we roll only one die, what is the probability distribution (express it as a table of outcomes and associated probabilities), the expectation, and the variance of its outcome?

(b) We  roll two dice at the same time  (independently).   What  is the  probability distribution of the sum of their outcomes?

[Hint: Express it as a table of outcomes and associated probabilities.]

(c) We roll a thousand (1000) dice at the same time, each roll is independent of the others.

Based on the Central Limit Theorem, find an approximation of the probability distribution of the total sum of all values.

What is the mean and variance of the sum?

(d) We repeatedly roll a thousand dice K times, computing the sum of their values Sk for k = 1; : : : ; K .

What can you say about the distribution of the averagesLk(K)=1 Sk \ K?

Q.3  [8 marks]

Consider a continuous random variable with a well defined CDF, X FX .  Let g be a monotonically increasing and invertible function.  The CDF of the random variable Y resulting from the transformation Y = g(X) is given by:

FY (y) = FX (g 1 (y))

and its corresponding probability density function (pdf) is given by:

fY (y) = fX (g 1 (y))

(2)

(3)

(a)  If X  Exponential(–) with cdf given by:

F(x) = ,0(1) ì exp ( ìwi(幺)se(0)                              (4)

then show that the transformation Y = ^2–¸2X results in Y Rayleigh(¸) with cdf given by

F(y) = ,··  1 ì exp ì \   for y  0                            (5)

(b)  Use the pdf transformation in Equation (3) to derive the pdf of the Ray leig h(¸). Do not simply differentiate the cdf of the Rayleigh distribution in Equation (5).

Q.4  [6 marks]

We want to sample from a Beta distribution with parameters ¸ = 3 and p = 2. The density function is given by:

f(x)   =   x¸ − 1 (1 ì x)p − 1

=   x2 (1 ì x);        with 0 < x < 1.

Notice that the maximum of f(x) is 16/9 in the interval [0; 1].

(a)  How can we sample from this distribution using the rejection sampling method?

Explain by making reference to the following graph:

16/9

 

14/9

 

12/9

 

10/9

 

8/9

 

6/9

 

4/9

 

2/9

 

0

 

0.0                                    0.2                                    0.4                                    0.6                                    0.8                                    1

x

(b)  Explain why it would have been less efficient to sample from U Uniform(0; 3) rather than U Uniform(0; 16/9).

Q.5  [8 marks]

Given a sample of observations of size n = 10 with Xi  ↓ N (— = 1; ff = 2) we want to

simulate the power of rejecting the null hypothesis

H0  :    — 1

at a type-I-error of ¸ = 0:05.

(a)  Find the four mistakes in the following R code:

rejectH0  <-  logical(1000)

for  (i  in  1:1000){

x  <-  rnorm (n=10 ,  mean=0 ,  sd=2)

pvalue  <-  t .test(x,  mu=1 ,  alternative="less")$p .value rejectH0  <-  pvalue  >  0.05

}

mean (rejectH0)

##  [1]  1

(b) After correcting all mistakes, what result would you expect from the simulation study?

(c)  In the following Figure, highlight the type-I-error ¸, the type-II-error ˛ and the statistical power.

 

Q.6  [5 marks]

Consider the two samples

xi  = 卜4; 2;    and   yj  = 卜1; 3;

We want use a permutation test to test the hypothesis if the mean for population X is larger than the mean for population Y .

H0  :    —X  < —Y

(a)  How many combinations are possible to uniquely allocate individuals into the two groups?

(b)  Calculate a test statistic for the observed sample and for all permutations to test the hypothesis of interest.

(c)  Calculate the p-value. Can H0  be rejected at a type-I-error rate of ¸ = 0.05?

Q.7  [5 marks]

The variance b(2)  has been estimated for five bootstrap samples,

b(2) = 卜9.61; 2.25; 2.89; 5.29; 6.25; ;

that were generated from a sample of normally distributed observations with a variance of 2  = 5.76.

Estimate the bias of the standard deviation estimator.

Estimate the standard error of the standard deviation statistic.

(c)  Find (1 ì ¸) = 0.5 standard normal condence limits for the standard deviation.

(d)  Find (1 ì ¸) = 0.5 percentile condence limits for the standard deviation.

Q.8  [6 marks]

Recall that the kernel density estimator is given by:

f((x) =   K x h(ì) xi  

where K(u) is the kernel function.

(a)  Suppose we have two datapoints at x1  = 0.5 and x2  = 2.  Draw on the following

graph the kernel density estimate f((x) using the kernel

Use a bandwidth of h = 1.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−1

x

(b)  For one datapoint x1  = 0.5, under the assumption that x > 0, use the reflection

method to draw a boundary corrected kernel density estimate f((x) using the kernel

Use a bandwidth of h = 1.

2

x