闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PSTAT 105: Final Exam Info

2022

1. Suppose that we observed the following 15 observations

120.96 134.00 189.49 245.55 339.11 353.46 355.98 366.83 445.93 479.78 486.28 537.99 558.12 729.46 748.85

We want to estimate the density at t = 450

(a) Calculate the kernel density estimator of the density at 450 for a rectangular kernel with bandwidth

50 (i.e. the kernel is constant from 400 to 500 and 0 elsewhere)

(b) Estimate the variance of this estimator.

(d) Why might we choose to take a bandwidth larger than 50?

2. Ms Morgan takes a poll of her second grade class and asks each student how many children are in their family. The results were

Size of Family 1 2 3 4+

Count 24 16 6 5

(a) Suppose that we assume that the data has a geometric distribution with

P(X = k} = pk − 1 (1 _ p), for k = 1, 2, 3, 4, 5, 6, . . .

Find the maximum likelihood estimator of the probability p.

(b) We want to use a χ2 goodness of ﬁt test to test our assumption that the data is from a geometric

distribution. Calculate the appropriate test statistic.

3. We have 345 observations on the hourly log-return for a certain bond index, and we want to use a Kolmogorov–Smirnov test of the goodness of ﬁt for a standard normal distribution.

(a) The ﬁrst step in performing this test is to transform our data set returns,

Un <- pnorm(returns, mean=0, sd=1)

What is the purpose of this transformation?

(b) Our software calculates the upper and lower statistics as

Dn(+) = 0.0599 Dn(−) = 0.1314

Find an appropriate approximation for the P value of this test.

4. In question 3, we used the speciﬁc null hypothesis of X(0, 1). However, we might not really believe that the mean log return is 0, and so we decide to subtract from all of the data ﬁrst. This centers the data so that it now has mean 0.

(a) Would you expect the KS test statistic to be larger or smaller using the data with the mean

removed? Why?

(b) The P value from question 3 would no longer be appropriate so we decide to do a simulation

study to approximate the probability. We produce 100,000 random replications of 345 standard normal random variables, and only 4,978 of those replications had a test statistic greater than or equal to our Dn . Give a 95% conﬁdence interval for our approximate P value.

5. We have 25 data observations, and we want to check if they are uniformly distributed between 0 and

10.

0.1 2.4 3.4 5.4 7.3

0.4 2.6 4.3 5.6 7.4

0.4 2.6 4.8 6.0 7.9

1.9

2.6

5.1

6.5

8.1

1.9 3.3 5.2 6.8 8.5

(a) Use a χ2 goodness of ﬁt test with four intervals to test the null hypothesis that these are from a

uniform distribution. Please explain your conclusions clearly.

(b) Alternatively, we could use the Kolmogorov–Smirnov goodness of ﬁt test for the data and hypoth-

esis from part (a). Calculate the test statistic D for this test.

(d) What is the advantage of the Kolmogorov–Smirnov test over the χ2 test in this problem?

6. We collected wind speeds on 20 days in November.

73.7 85.8 87.6 113.5 123.4

141.1 153.9 156.2 182.7 182.9

206.1 206.9 278.6 290.1 328.3

339.8 352.3 373.6 381.8 449.9

(a) Assuming that these are independent observations, estimate the probability that speed is less

than 300.

(b) Give a 95% conﬁdence interval for your probability estimate.

,． + + K30 (t) = ． _

．0

_30 < t < 0

0 < t < 30

elsewhere

to ﬁnd an estimate of the density of the data at 300.

(d) If this kernel K30 (t) has a bandwidth of 30, then write down the kernel function K90 (t) that has a bandwidth of 90.

(e) What is the advantage of using the wider bandwidth kernel?

(f) In order to estimate the standard deviation of this kernel estimator, I decided to try to use a

bootstrap methodology.

> sds <- rep(-999,8)

> for( j in 1:8) {

boot.wind <- matrix(sample(wind.speed,size=20000,replace=TRUE),ncol=20) kern.bw <- (1/30 - abs(boot.wind-300)/900)*(abs(boot.wind-300)<30) hat.f <- rowMeans(kern.bw)

sds[j] <- sd(hat.f)

+ }

> signif(sds,4)

[1] 0.001104 0.001158 0.001137 0.001154 0.001133 0.001200 0.001155 0.001171 > mean(sds)

[1] 0.001151571

> sd(sds)

[1] 2.817011e-05

What is our bootstrap estimator of sd(fˆ)? Give an approximate margin of error for this estimator.

7. The following data was collected on the lifetimes of rubber gaskets in marine applications.

10 12 37 55 59 63 68 77

109 111 126 158 162 163 188 197

I want to test whether this data is from an Exponential distribution with mean 100. P0 (X < x} = 1 _ e −x/100 .

(a) If we are going to use a χ2 test, ﬁrst we need to divide this data into 3 appropriate intervals.

Calculate the expected values from the null hypothesis for each of the intervals you have chosen.

(b) Find the χ2 test statistic and compare it to the correct critical value from the table for an α = 0.05

level test. What do you conclude?

(d) Calculate the approximate P value from this KS test statistic.

> data <- c(10,12,37,55,59,63,68,77,109,111,126,158,162,163,188,197) > pexp(data, rate=1/100)

[1] 0.095162582 0.113079563 0.309265669 0.42305019 0.445672715

[6] 0.467408199 0.493383008 0.536986932 0.663783506 0.670441039

[11] 0.716345974 0.794024902 0.802101301 0.804070426 0.847409894

[16] 0.860543144

8. From the data in question 7, we want to estimate the density of this data.

(a) Calculate a 95% conﬁdence interval for p = P(X < 100}.

(b) Use a kernel density estimator with a rectangular kernel which has a bandwidth h = 25 to estimate

the density at x = 100.

(d) What advantage would there be to using a smaller bandwidth?

9. The median from the data in question 7 is (77 + 109)/2 = 93. We want to perform a test of the null hypothesis H0 : Xi ~ Exponential(100). We decide to use the test statistic T = 93 and reject the null hypothesis for large values of T.

(a) Write R code which uses simulation to calculate an approximate P value for this test. Run this

code for 1000 simulated samples.

(b) Find a rough estimate for how long it would take to run this code on 109 samples. Clearly show

and explain your calculations.

10. Using a data set of teacher salaries, we tabulate the salaries according to some convenient ranges

Salary Levels (	in	$1000’s)	< 40	40–60	60–80	80-100	> 100
Counts			52	75	177	267	9

The average salary among the teachers was = $72, 922 with an estimated standard deviation of s = $19, 943.

(a) Use a Chi-squared goodness of ﬁt to test whether or not it is reasonable to consider this data

as coming from a normal distribution with some mean and standard deviation. Please carefully show the steps you are taking and the ﬁnal conclusion that you draw.

(b) What advantage does this test have over the Lilliefors test?

11. A trial measured the relative eﬃcacy of a treatment for a variety of doses.

Dose 0 3 3 4 5 6 6 7 8 10

Eﬃcacy 2.56 0.78 4.79 2.44 1.01 1.05 -1.07 -1.37 -6.18 -12.03

We want to model the eﬃcacy as the dependent variable in a nonparametric regression model. In par- ticular, we’ll use a Nadaraya–Watson Kernel Estimator with an Epanechnikov kernel with bandwidth 2,

K2 (t) = ╱ 1 _ 、 , for _2 < t < 2.

(a) Calculate the regression estimate this would give for when the dose is 2.

(b) Calculate the bias in this regression estimate at the point where the Dose= 2 if the true mean

function is

E(Eﬃcacy) = 3 + Dose _ Dose2

(c) If instead I used a bandwidth of 3, will the bias be larger or smaller? What is the beneﬁt to using a larger bandwidth?

(d) How would our estimate change if we wanted to ﬁt a local linear model using the same kernel function, K2 (t)? Show me your calculations.

12. For this question, you will need to download the VV.txt data set from GauchoSpace. Please include R input and output in your written answer so that we can we see the steps you used to generate your answer. You should also include any explanations that are necessary.

The data comes from 10,000 simulations of a variance estimator to be used in a GEE procedure. My previous calculations suggest that the simulation outcomes should be distributed like aX2 where X2 is a chi-squared random variable with 8.7 degrees of freedom and

a = 0.0088

(a) Perform a Kolmogorov–Smirnov goodness of ﬁt test to check our null hypothesis of a scaled

chi-squared distribution.

(b) Whether or not the entire distribution is chi-squared, it is most important (if I’m going to use

this as a test statistic) that the 95th percentile of the data matches my hypothesis. Perform a hypothesis test that checks whether the 95th percentile of my observations is consistent with the 95th percentile of the χ8(2) .7 times a.

(c) Draw a plot that compares the data from the simulations to the properly scaled chi-squared density and to a Kernel Density Estimate. Use a gaussian kernel and a cross-validation method to choose the kernel bandwidth. Properly label your plot.

(d) On my laptop, it took 5.28 seconds to perform the 105 simulations to produce the VV.txt data set. I’m interested in estimating the probability that an observation is bigger than 0.02, and I think I need to do more simulations to get a more accurate answer. If I want to estimate the probability of being bigger than 0.02 to within a margin of error of 0 .00001 = 10 −5 , about how long will this simulation take?