闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 231 Winter 2023 – Assignment 1

Due: January 26 2023, 11:59PM

Total number of questions: 3

Total points: 30

Instructions: submit your work as .pdf files through Crowdmark. For the questions asking for a plot, make the plot in R and upload a pdf of the plot directly to the relevant question on Crowdmark. Specific instructions will be provided on Crowdmark.

1. (10 points) Let y1 , . . . ,yn ∈ R be data, where n ∈ N. Let = (1/n) 对 yi and Syy = 对(yi − y¯)2 . (a) (2 points) Show 对(yi − y¯) = 0 and Syy = 对 yi(2) − ny¯2 .

See Solutions to Chapter 1 Problems 1.1 (a) and 1.2 (a). 1 point per derivation.

(b) (2 points) Gas in Canada is sold in litres and in the US is sold in gallons. One US gallon is apparently

equal to 3.79 litres, for some reason. Let y1 , . . . ,yn represent the amount of gas in litres from n fill-ups for a typical Canadian driver, and let z1 , . . . ,zn represent these same fill-ups measured in American gallons.

(i) (1 point) Write zi in terms of yi and write yi in terms of zi .

zi = (1/3.79)yi ,yi = 3.79zi .

(ii) (0.5 points) Suppose = 45, find .

= /3.79 = 45/3.79 = 11.87

(iii) (0.5 points) Suppose Szz = 20, find Syy .

Syy = 对(yi − y¯)2 = (3.79)2 对(zi − )2 = (3.79)2 Szz = 287.28

(i) (2 points) Construct a set of data y1 ,y2 that has = 0 and Syy = M .

y1 = ^M/2,y2 = − ^M/2

(ii) (2 points) Construct a set of data y1 ,y2 that has = M and Syy = 0.

y1 = y2 = M

(d) (2 points) Suppose we observe data taking values in a set Ω (“sample space”), i.e. yi ∈ Ω for each i = 1, . . . ,n. Repeated values are allowed. Suppose = 1 and n = 2. For each of the following sets, either find two different sets of data with this n and but different Syy , or calculate Syy based only on this information.

(i) (0.5 points) Ω = {0, 1}

yi(2) = yi since yi is either 0 or 1. n = 2 and = 1 means y1 = y2 = 1 and y2 = 1, so 对i(2)=1 yi = 对i(2)=1 yi(2) = 2 and Syy = 2 − 2 × (12 ) = 0

(ii) (0.5 points) Ω = {0, 1, 2}

Counter example: {1, 1} has Syy = 0 and {0, 2} has Syy = 4 − 2 × (12 ) = 2

(iii) (0.5 points) Ω = Z

Counter example: same as (ii), but also try e.g. {−a,a + 2} and {−b,b + 2} for any a b ∈ Z

(iv) (0.5 points) Ω = R

Counter example: same as (ii), but also try e.g. {−a,a + 2} and {−b,b + 2} for any a b ∈ R

2. (10 points) Suppose an independent sample y1 , . . . ,yn is obtained where yi ∈ {0, 1, 2, . . .} are non-negative integers. Consider a Poisson model for the yi . Denote the unknown parameter of the Poisson distribution by λ > 0.

(a) (1 point) Write down the probability density/mass function for a single yi .

P(Yi = yi ;λ) = λyi e −λ /yi !

(b) (1 point) Write down the joint probability density/mass function of y1 , . . . ,yn .

P(Y1 = y1 , . . . ,Yn = yn ;λ) = u P(Yi = yi ;λ) = λny¯e −nλ / u yi !

(i) (1 point) What is the probability of observing these data if λ = 3?

P(Y1 = 7,Y2 = 2,Y3 = 3,Y4 = 0,Y5 = 4;λ = 3) = 316 e − 15 /(7!2!3!0!4!) = 9.07 × 10 −6

(ii) (1 point) What is the probability of observing these data, as a function of the unknown λ?

P(Y1 = 7,Y2 = 2,Y3 = 3,Y4 = 0,Y5 = 4;λ) = λ16 e −5λ/1451520

(iii) (1 point) Compute and explain why this is a reasonable/plausible guess at the unknown λ .

y¯ = 3.2. If Y ∼ Poisson(λ) then EY = λ, so it is intuitively plausible that , the mean of the data, should be close to λ, the mean of the distribution from which we think the data were generated.

(d) (2 points) Use the code below to create a bar graph of these data with Poisson(3 .2) probability mass function values on the y-axis, and the probability mass function drawn over it as a curve. Comment on whether you think the Poisson(3.2) model fits the data well.

y <- c(7,2,3,0,4)

hist(y,breaks=8,freq=FALSE,ylim = c(0,1))

for (i in 0:7) points(x=i,y=dpois(i,3 .2),pch=20)

It is not obvious whether the Poisson model

fits well.

(e) (3 points) Suppose instead that {y1 , . . . ,y5 } = {16, 0, 0, 0, 0}. What is the probability of observing

these data, as a function of the unknown λ? What is and hence a plausible estimate of λ? Produce the same graph as in (d): does the Poisson model fit the data well?

Probability of observing data: same as (c) (ii). : same as (c) (iii). Fit: no, the Poisson model does not fit the data well, observing yi = 16 and observing four zeroes both have very low probability under this model.

3. (10 points) (A. C. Davison (2008), Statistical Models, Example 2.3). Run the following code to obtain data on the delivery times of n patients at the John Radcliffe Hospital in Oxford over several days:

install .packages("SMPracticals")

library(SMPracticals)

data(births,package = "SMPracticals")

head(births)

day

time

19.0 9.5 3.4 7.3 16.0 8.5

Answer the following questions. You can type ?births to get some information about the data.

(a) (1 point) What is the sample size n?

n = nrow(births) = 95

(b) (3 points) What are the mean, median, standard deviation, and IQR of the delivery times? Comment

on whether you think there are any outliers in the data.

with(births,summary(time)) gives = 7.723, mˆ = 7.5.

with(births,sd(time)) = 3 .57, with(births,IQR(time))=4 .8

The two measures of centre are pretty close, indicating no serious outliers. Likewise, the standard deviation is not very large relative to the IQR, indicating the same thing.

(c) (3 points) Plot the empirical cumulative distribution function of delivery time. Comment on how long “most” deliveries take. Overlay the CDF of a Gaussian distribution with mean and standard deviation equal to the mean and standard deviation of the delivery times. Comment on how well you think the Gaussian model fits the data.

“Most” births seem to take less than 15 hours, where the ECDF levels off. The Gaussian CDF seems very close to the ECDF indicating good fit.

(d) (1 point) Plot a density histogram of the observed data. Overlay a Gaussian density with mean and standard deviation equal to the mean and standard deviation of the delivery times. Comment on how well you think the Gaussian model fits the data.