Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 231 Fall 2022

Assignment 4

Many problems on this assignment indicate that your answers must be given in sentences. This course emphasizes learning to communicate statistical concepts in sentences.

In some of the problems on this assignment you are asked to use R.  Only the answers/results you obtain using R must be included in your Crowdmark pdf submission. Your R code must be uploaded as an R file to the Assignment 4 R Effectively commenting your code is a important skill to develop. Markers will review your file and run it to verify the answers match those in your Crowdmark submission and that the code runs without error. Your code must correctly find the answers needed to get the marks associated with the problems. Good commenting will allow the marker to more easily assign you a full score when reviewing your file. Please ensure your code

Assignment 4 Learning Outcomes

Here are the intended learning outcomes for this assignment component. Try to identify the learning outcomes which are achieved by each of the given problems.

Enjoy ���

· Perform a test of hypothesis for Binomial(n,θ), Poisson(θ), and Exponential(θ) models using a test statistic based on the asymptotic Gaussian pivotal quantity and a  likelihood ratio test.

· Perform a test of hypothesis for the parameter μ and the parameter σ in a Gaussian model.

· Observe the connection between confidence intervals, likelihood intervals and hypothesis tests.

· Observe how p-values vary as the hypothesized value and sample size vary.

In Problems 1-4 you will continue to analyse variates in your data set. Make sure that you use the same data set that you generated, saved, and uploaded to the LEARN Dropbox as part of the Prerequisite Assignment.

Your commented R code should only be included in the R file that you upload to the LEARN Dropbox. Do not include your R code in your answers submitted to Crowdmark. All written answers must be in full sentences. Please do not include any instructions in your assignment submission to Crowdmark or the LEARN Dropbox.

Note: When conducting a test of hypothesis you should use a two-sided test unless otherwise stated. Be sure to show how your p-value was determined. When stating a conclusion about a null hypothesis please use the guidelines in Table 5.1 of the Course Notes.

Problem 1: Tests of hypothesis for Binomial model

The purpose of this problem is to test the hypothesis H0 : S = S0 for Binomial(n,S) data. See Sections 5.1 to 5.3 and Table 5.2 of the Course Notes.

In this problem you will examine the data for the variate Eye.Colour (What is your eye colour?).

(a) Complete the following statements:

(i) My student ID number is  __________.

(ii) The sample size for my data set is  __________.

(iii) The variate Eye.Colour is a __________ variate.

(iv) The number of observations for which the variate Eye.Colour was not missing is

__________.

Define the new variate if student i indicated that they have blue eyes and otherwise, . Let

(v) Complete the sentence: The value of for my data set is __________.

(b) Describe a suitable study population for the empirical study which you have been conducting on the assignments and for which the sampling protocol consisted of downloading a data set from the CensusAtSchool New Zealand 2021 website.

(c) Let the random variable Y be the number of students with blue eyes. Assume that Y has a Binomial(n,θ) distribution.

(i) The parameter θ corresponds to what attribute of interest in the study population?

(ii) What is the maximum likelihood estimate of θ for your data set if a Binomial model is assumed?

(d) Ten percent of the world’s population has blue eyes. You are interested in whether the percentage of students in the study population in (b) is also 10%. Assume a Binomial model for the following tests of hypotheses.

(i) For your data determine d, the observed value of the test statistic

for testing H0: θ = θ0 where θ0 = 0.1.

(ii) Use the value of d determined in (i) and the Normal approximation to the Binomial distribution to determine the approximate p-value for testing H0: θ = 0.1. (A continuity correction is not required.) State your conclusion regarding the hypothesis H0: θ = 0.1 based on the approximate p-value.

(iii) Is the value θ = 0.1 an element of an approximate 95% confidence interval for θ based on the asymptotic Gaussian pivotal quantity? Explain why or why not using only the p-value determined in (ii).

(iv) For your data determine the observed value of the likelihood ratio statistic  λ(θ0) where

θ0 = 0.1.

Note: The following R code will calculate the likelihood ratio statistic for a test of the null hypothesis that theta = theta0 for the Binomial model if the maximum likelihood estimate of theta is thetahat and the sample size is n:

lambda<-(-2*log((theta0/thetahat)^(n*thetahat)*((1-theta0)/(1-thetahat))^(n-n*thetahat)))

or equivalently

lambda<-2*n*(thetahat*log(thetahat/theta0)+ (1-thetahat)*log((1-thetahat)/(1-theta0)))

(v) Use the value of λ(0.1) and the asymptotic distribution of the likelihood ratio statistic to determine the approximate p-value for testing H0: θ = 0.1. State your conclusion regarding

H0: θ = 0.1 based on the approximate p-value.

(vi) Is your conclusion in (v) the same as your conclusion in (ii)? Briefly explain why you would (or would not) expect these conclusions to be the same.

(vii) Is the value θ = 0.1 an element of a 15% likelihood interval for θ? Explain why or why not without determining the interval.

Problem 2: Tests of hypothesis for Poisson model

The purpose of this problem is to test the hypothesis H0 : S = S0 for Poisson(S) data. See Sections 5.1 to 5.3, and Table 5.2 of the Course Notes.

In this problem you will examine the data for the derived variate Technology that you analysed in Problem 4 on Assignment 3.

(a) Assume a Poisson model for the variate Technology. Recall that you examined the fit of the Poisson model to these data on Assignment 3.

(i) The parameter θ corresponds to what attribute of interest in the study population?

(ii) What is the maximum likelihood estimate of θ for your data set?

(b) A well-known child psychologist in New Zealand has recommended that students should only engage in 4 types of technology in a week.  You are interested in whether the mean number of total ticks for the variate Technology in the study population equals 4. Assume a Poisson model for the following tests of hypotheses.

(i) For your data determine d, the observed value of the test statistic

for testing H0: θ = θ0 where θ0 = 4.

(ii) Use the value of d determined in (i) and the Normal approximation to the Poisson distribution to determine the approximate p-value for testing H0: θ = 4. (A continuity correction is not required.) State your conclusion regarding H0: θ = 4 based on the approximate p-value.

(iii) Is the value θ = 4 an element of an approximate 90% confidence interval for θ based on the asymptotic Gaussian pivotal quantity? Explain why or why not using only the p-value determined in (ii).

(iv) For your data determine the observed value of the likelihood ratio statistic  λ(θ0) where

θ0 = 4.

Note: The following R code will calculate the likelihood ratio statistic for a test of the null hypothesis that theta = theta0 for the Poisson model if the maximum likelihood estimate of theta is thetahat and the sample size is n:

lambda<-(-2*log((theta0/thetahat)^(n*thetahat)*exp(n*(thetahat-theta0))))

or equivalently

lambda<-2*n*(thetahat*log(thetahat/theta0)+(theta0-thetahat))

(v) Use the value of λ(4) and the asymptotic distribution of the likelihood ratio statistic  to determine the approximate p-value for testing H0: θ = 4. State your conclusion regarding

H0: θ = 4 based on the approximate p-value.

(vi) Is your conclusion in (v) the same as your conclusion in (ii)? Briefly explain why you would (or would not) expect these conclusions to be the same.

(vii) Is the value θ = 4 an element of a 10% likelihood interval for θ? Explain why or why not without determining the interval.

Problem 3: Tests of hypothesis for Exponential model

The purpose of this problem is to test the hypothesis H0 : S = S0 for Exponential(S) data. See Sections 5.1 to 5.3, and Table 5.3 of the Course Notes.

In this problem you will examine the data for the variate Travel.time.to.school. (How long does it usually take you to get to school? Answer to the nearest minute.)

(a) Let the random variable Yi be the travel time to school. Assume that Yi has a Exponential(θ) distribution, i=1,2,…,n. Recall that you analysed the fit of the Exponential model to these data in Problem 5, Assignment 2.

(i) The parameter θ corresponds to what attribute of interest in the study population?

(ii) What is the maximum likelihood estimate of θ for your data set?

In 2011 the average time to school was 17 minutes for all the students in the study population. You are interested in whether the mean travel time to school for students in the study population in 2021 is 17 minutes. Assume an Exponential model for the following tests of hypotheses.

(b) (i) For you data determine d, the observed value of the test statistic

D =

for testing H0: θ = θ0 where θ0 = 17.

(ii) Use the value of d determined in (i) and the Normal approximation to the Exponential distribution to determine the approximate p-value for testing H0: θ = 17. State your conclusion regarding H0: θ = 17 based on the approximate p-value.

(iii) Is the value θ = 17 an element of an approximate 99% confidence interval for θ based on the asymptotic Gaussian pivotal quantity? Explain why or why not using only the p-value determined in (ii).

(iv) For your data determine the observed value of the likelihood ratio statistic  λ(θ0) where

θ0 = 17.

Note: The following R code will calculate the likelihood ratio statistic for a test of the null hypothesis that theta = theta0 for the Exponential model if the maximum likelihood estimate of theta is thetahat and the sample size is n:

lambda<-(-2*log((thetahat/theta0)^n*exp(n*(1-thetahat/theta0))))

or equivalently

lambda<-2*n*(log(theta0/thetahat)+(thetahat/theta0-1))

(v) Use the value of λ(17) and the asymptotic distribution of the likelihood ratio statistic  to determine the approximate p-value for testing H0: θ = 17. State your conclusion regarding

H0: θ = 17 based on the approximate p-value.

(vi) Is your conclusion in (v) the same as your conclusion in (ii)? Briefly explain why you would (or would not) expect these conclusions to be the same.

(vii) Is the value θ = 17 an element of a 5% likelihood interval for θ? Explain why or why not without determining the interval.

(viii) For your data determine d1, the observed value of the test statistic

D1 =

for testing H0: θ = θ0 where θ0 = 17.

(ix) Use the value of d1 determined in (viii) and the exact distribution of D1 to determine the p-value for testing H0: θ = 17 (see Table 5.3). Use the R command pchisq for your calculation. State your conclusion regarding H0: θ = 17 based on the p-value.

(x) Is your conclusion in (ix) the same as in (ii)? Briefly explain why you would (or would not) expect these conclusions to be the same.

Problem 4: Tests of hypotheses for Gaussian data

The purpose of this problem is to test the hypotheses for Gaussian data. See Sections 5.1 to 5.3, and Table 5.3 of the Course Notes.

In this problem you will examine the data for the Bag.weight variate (What is the weight of your school bag today? Answer in kilograms to one decimal place. (Weigh your school bag with all your books and other materials you brought to school today.).

(a) Complete the following statements:

(i) My student ID number is  __________.

(ii) The sample size for my data set is  __________.

(iii) The variate Bag.weight is a __________ variate.

(iv) The number of observations for which the variate bag.weight was not missing is

__________.

(b) Let the random variable Yi be the bag weight in kilograms. Assume that Yi has a distribution, i=1,2,…,n.

(i) The parameter μ and σ correspond to what attributes of interest in the study population?

(ii) Give the sample mean and sample standard deviation for this variate.

(iii) Give the sample skewness and sample kurtosis for this variate.

(iv) Give a qqplot for this variate.

(v) Is the Gaussian model a good fit to these data? Justify your answer.

(c) In 2011 the average bag weight was 3 kilograms for all the students in the study population. You are interested in whether the mean bag weight for students in the study population in 2021 is 3 kilograms.

(i) Use your data to test the hypothesis H0 : μ = 3.

Be sure to state the observed value of the test statistic

the corresponding p-value, and your conclusion based on this p-value. Explain how the p-value is determined by the R function t.test.

Note: To test hypotheses about the mean for a Gaussian model you can use the R command t.test(). See Chapter 5, Problem 3, for an example. You can also access specific results from using the t.test() command directly. For example, to test the null hypothesis that the mean of a Gaussian distribution is 3 for a sample called y, you use the R command

t.test(y, mu = 3)$p.value

and R returns the p-value specifically. You can also use $statistic and $parameter in a similar manner.

(ii)  Is the value μ = 3 an element of a 90% confidence interval for μ? Explain why or why not using only the p-value determined in (c)(i).

(d) (i) Use your data to test the hypothesis H0 : σ = 2.

Be sure to state the observed value of the test statistic

the corresponding p-value, and your conclusion based on this p-value.

(ii) Is the value σ = 2 an element of a 99% confidence interval for σ? Explain why or why not using only the p-value determined in (d)(i).

Problem 5: Tests of hypotheses and shiny app

Go to the shiny app: https://shiny.math.uwaterloo.ca/sas/stat231/teststatistics/

You can use this app to explore test statistics and hypothesis tests. You can first choose a probability distribution and a test statistic. You then specify a value for the model parameter under the null hypothesis. You can then adjust the sample size, and set the point estimate of the model parameter resulting from the sample. The right-hand window then displays a plot of the probability distribution corresponding to the test statistic chosen. The plot is then separated into regions based on the value of the resulting test statistic. You should think about how the areas under the probability distribution curves correspond to the resulting p-values.

Part A: Poisson(θ)

(a)

On the shiny app select Poisson as the distribution, Asymptotic Gaussian as the test statistic, 3 as the H0 value for θ, and 20 as the sample size. As you move the slider for MLE of θ, you will see how the value of the test statistic and the corresponding p-value for testing H0: θ = 3 vary as the value of varies.

(i) Use the shiny app to complete Table 1.

Table 1

| – 3|

value of Gaussian test statistic

p-value

2

1

2.2

0.8

2.4

0.6

2.6

0.4

2.8

0.2

(ii) How does the p-value for testing H0: θ = 3 change as the quantity | – 3| decreases? Explain why this behaviour makes sense.

(iii) What other value of generates the identical test statistic and p-value as when

| – 3| = 0.8?

(iv)For = 2.4, use the information from Table 1 to determine what the p-value is for testing

H0: θ = 3 versus the one-sided alternative hypothesis HA: θ < 3.

(b) On the shiny app select Poisson as the distribution, Asymptotic Gaussian as the test statistic, 3 as the H0 value for θ, and 2.4 as the MLE of θ. As you change the value of the sample size, you will see how the value of the test statistic and the corresponding p-value for testing H0: θ = 3 vary as the value of the sample size varies.

(i) Use the shiny app to complete Table 2.

Table 2

sample size

value of Gaussian test statistic

p-value

20

25

30

35

40

(ii) How does the p-value for testing H0: θ = 3 change as the sample size increases for a fixed value of ? Explain why this behaviour makes sense.

(c) On the shiny app select Poisson as the distribution, Likelihood ratio as the test statistic, 3 as the H0 value for θ, and 20 as the sample size. As you move the slider for MLE of θ, you will see how the value of the test statistic and the corresponding p-value for testing H0: θ = 3 vary as the value of varies.

(i) Use the shiny app to complete Table 3.

Table 3

| – 3|

value of LR test statistic

p-value

2

1

2.2

0.8

2.4

0.6

2.6

0.4

2.8

0.2

(ii) Compare the p-values in Table 3 with the p-values in Table 1. If you use Table 5.1 as your guide for conclusions, is there any value of in Table 1 which gives a different conclusion regarding the hypothesis H0: θ = 3 for the same value of in Table 3?

Part B: Gaussian mean μ

(a) On the shiny app select G(μ,σ) as the distribution, Mean (μ) as the Test for mean or standard deviation, 0 as the H0 value for μ, and 25 as the sample size, 0.5 as the sample mean, and 2 as the sample standard deviation.  As you change the value of the sample size, you will see how the value of the test statistic and the corresponding p-value for testing H0: μ =0  vary as the value of the sample size varies.

(i) Use the shiny app to complete Table 4.

Table 4

sample size

value of test statistic for mean

p-value

25

35

45

55

(ii) How does the p-value for testing H0: μ =0  change as the sample size increases for a fixed value of the sample standard deviation? Explain why this behaviour makes sense.

(b) On the shiny app select G(μ,σ) as the distribution, Mean (μ) as the Test for mean or standard deviation, 0 as the H0 value for μ, and 30 as the sample size, 0.5 as the sample mean, and 0.8 as the sample standard deviation. As you move the slider for sample standard deviation, you will see how the value of the test statistic and the corresponding p-value for testing H0: μ = 0  vary as the value of the sample standard deviation varies.

(i) Use the shiny app to complete Table 5.

Table 5

sample standard deviation

value of test statistic for mean

p-value

0.8

1.2

1.6

2.0

(ii) How does the p-value for testing H0: μ =0  change as the sample standard deviation increases for a fixed value of the sample size? Explain why this behaviour makes sense.

Part C:  Gaussian standard deviation σ

(a) On the shiny app select G(μ,σ) as the distribution, Standard deviation (σ) as the Test for mean or standard deviation, 4 as the H0 value for σ, and 30 as the sample size, 1 as the sample mean, and 4.4 as the sample standard deviation. As you move the slider for the sample standard deviation, you will see how the value of the test statistic and the corresponding p-value for testing H0: σ = 4  vary as the value of the sample standard deviation varies.

(i) Use the shiny app to complete Table 6.

Table 6

sample standard deviation

value of test statistic for standard deviation

p-value

4.4

4.7

5.2

5.5

(ii) How does the p-value for testing H0: σ = 4  change as the sample standard deviation increases for a fixed value of the sample mean and the sample size? Explain why this this happens.

(b)

(i) On the shiny app select G(μ,σ) as the distribution, Standard deviation (σ) as the Test for mean or standard deviation, 2 as the H0 value for σ, and 30 as the sample size, 0 as the sample mean, and 2.5 as the sample standard deviation. Record the value of the test statistic and p-value for testing H0: σ = 2.

(ii) On the shiny app select G(μ,σ) as the distribution, Standard deviation (σ) as the Test for mean or standard deviation, 4 as the H0 value for σ, and 30 as the sample size, 0 as the sample mean, and 5 as the sample standard deviation. Record the value of the test statistic and p-value for testing H0: σ = 4.

(iii) What do you notice about the values of the test statistic and p-value for these two cases? Explain why this happens.

(c) On the shiny app select G(μ,σ) as the distribution, Standard deviation (σ) as the Test for mean or standard deviation, 2 as the H0 value for σ, and 30 as the sample size, -1 as the sample mean, and 2.5 as the sample standard deviation. As you move the slider for the sample mean, you will see how the value of the test statistic and the corresponding p-value for testing H0: σ = 2  vary as the value of the sample mean varies.

(i) Use the shiny app to complete Table 7.