闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PSTAT 105: Solutions to Practice Final Problems

2022

1. Suppose that we observed the following 15 observations

120.96 134.00 189.49 245.55 339.11 353.46 355.98 366.83 445.93 479.78 486.28 537.99 558.12 729.46 748.85

We want to estimate the density at t = 450

(a) The kernel density estimator of the density at 450 for a rectangular kernel with bandwidth 50

(i.e. the kernel is constant from 400 to 500 and 0 elsewhere) is

fˆ(450) = (500) _ (400) = the number of observations in [400,500]

= = 0.002

(b) The variance of this estimator is

Var(fˆ) = ╱ 、

= = 1.067 × 10 —6

(d) We may choose to take a bandwidth larger than 50 in order to decrease the variance of the estimator. This will result in a smoother density estimate.

2. Ms. Morgan takes a poll of her second grade class and asks each student how many children are in their family. The results were

Size of Family 1 2 3 4+

Count 24 16 6 5

Prob (1 _ p) p(1 _ p) p2 (1 _ p) p3

(a) The log-likelihood is (we treat the last column as observations censored at 3)

l(p) = (1 _ p)24p16 (1 _ p)16p12 (1 _ p)6p15

log(l(p)) = 43 log p + 46 log(1 _ p)

log(l(p)) = _

=→ pˆ = = 0.48315

(b) The probability of each column can now be calculated using the MLE.

Size of Family

3 4+

Count

Prob

Expected

╱ 、2 = 0.1206

6.15

where the expected values are 51 times the probability.

The statistic is

X2 = + + +

= 0.2112 + 0.8368 + 0.0038 + 0.0983 = 1.15.

> 1 - pchisq(1.15,2)

[1] 0.5627049

3. We have 345 observations on the hourly log-return for a certain bond index, and we want to use a Kolmogorov–Smirnov test of the goodness of ﬁt for a standard normal distribution.

(a) The ﬁrst step in performing this test is to transform our data set returns,

Un <- pnorm(returns, mean=0, sd=1)

The purpose of this transformation is to produce uniform[0 , 1] observations. It is a sort of stan- dardization, and we use it because the KS test is designed to test the null hypothesis that the data is from a uniform distribution.

(b) Our software calculates the upper and lower statistics as

Dn(+) = 0.0599 Dn(—) = 0.1314

An appropriate approximation for the P value of this test is

P = 2 L(_1)j — 1 e —2j2 nDn(2) = 2 ←e —2(345)(0 . 1314)2 _ e —8(345)(0 . 1314)2 + . . . ! = 1.34 × 10 —5

(c) This P value is very small which means that we should reject the null hypothesis. There is a signiﬁcant diﬀerence between the distribution of our returns and a normal(0,1) distribution. The standard normal distribution is not a good ﬁt for our data.

4. In question 3, we used the speciﬁc null hypothesis of Ⅳ (0, 1). However, we might not really believe that the mean log return is 0, and so we decide to subtract from all of the data ﬁrst. This centers the data so that it now has mean 0.

(a) We expect the KS test statistic to be smaller using the data with the mean removed because

subtracting the mean makes the mean of the result 0. The result is data that looks more like a standard normal, and therefore the distance between the distribution of the data and the standard normal is smaller.

For instance, suppose that the actually mean of the data was 100. Then we would expect the D statistic to be very large because the distribution is far away from a normal(0,1). However, if we subtract oﬀ the sample mean (which is nearly 100) then the resulting data set will be centered around 0. This will be closer to our null hypothesis which has mean 0 and variance 1.

(b) The P value from question 3 would no longer be appropriate so we decide to do a simulation

study to approximate the probability. We produce 100,000 random replications of 345 standard normal random variables, and only 4,978 of those replications had a test statistic greater than or equal to our Dn .

This is like a binomial experiment where we have 100,000 independent trials and observed X = 4, 978 successes. The estimate is

pˆ = = 0.04978

Our 95% conﬁdence interval for p in this binomial experiment is

(pˆ(1 _ pˆ)

which is [0.04852, 0.05122]

This is an estimate of the probability under the null hypothesis that we would exceed Dn . Thus, this interval is an interval of our estimated P value.

1.96 (

If we set this error to 0.0001 then for n

0.0001 = 1.96 * (

1.962pˆ(1 _ pˆ)

0.00012

1.962 (0.04978)(1 _ 0.04978)

= = 18, 170, 000

5. We have 25 data observations, and we want to check if they are uniformly distributed between 0 and

10.

0.1 2.4 3.4 5.4 7.3

0.4 2.6 4.3 5.6 7.4

0.4 2.6 4.8 6.0 7.9

1.9

2.6

5.1

6.5

8.1

1.9 3.3 5.2 6.8 8.5

(a) A χ2 goodness of ﬁt test with four intervals to test the null hypothesis that these are from a

uniform distribution has expected values that are 25/4.

0- 2.5 2.5 - 5 5- 7.5 7.5 - 10

Obs 6 7 9 3

E 6.25 6.25 6.25 6.25

X2 = + + +

= 0.01 + 0.09 + 1.21 + 1.69 = 3.00

We should compare this to a χ2 distribution with 3 degrees of freedom. χ3(2) ,0 .95 = 7.815. We accept the null hypothesis. This test does not show that there is a statistically signiﬁcant deviation from the uniform distribution. We conclude that the data is reasonably close to uniform in distribution.

(b) Alternatively, we could use the Kolmogorov–Smirnov goodness of ﬁt test for the data and hypoth-

esis from part (a). First, we transform the data so that the null hypothesis is a Uniform[0 , 1]. In this case, that just means dividing by 10. Then we calculate

D+ = max i _ Ui

i _ 1

i n

The test statistic D for this test is calculated in table 1. It is D = 0.15.

P = 2 L(_1)j+1

j=1

e —2j2 nD2

= 2 ┌exp ┌ _2(25)(0.15)2 ┐ _ exp ┌ _8(25)(0.152 )┐ + exp ┌ _18(25)(0.152 )┐ + . . . ┐

= 2 [0.3247 _ 0.0111 + 0.00004 + . . . ] = 0.6272

This is a large P value and we would conclude that we should accept the null hypothesis. There is not a signiﬁcant diﬀerence between our data and the uniform distribution.

(d) The advantage of the Kolmogorov–Smirnov test over the χ2 test in this problem is that it can detect diﬀerences over the whole interval. It does not rely on an arbitrary partition of the sample space.

6. We collected wind speeds on 20 days in November.

73.7 85.8 87.6 113.5 123.4

141.1 153.9 156.2 182.7 182.9

206.1 206.9 278.6 290.1 328.3

339.8 352.3 373.6 381.8 449.9

(a) Assuming that these are independent observations,the probability that speed is less than 300 is

# less than 300 14

n 20

(b) A 95% conﬁdence interval for the probability estimate is

pˆ 士 1.96 !pˆ(1 _ pˆ)/n = 0.7 士 1.96 !0.7(0.3)/20

this is [0.4992, 0.9008].

,￥ + _30 < t < 0

K30 (t) = │ _ 0 < t < 30

￥『0 elsewhere

he estimate of the density of the data at 300 is

L K(Xi _ 300) = [K(278.6 _ 300) + K(290.1 _ 300) + K(328.3 _ 300)]

i=1

= [0.00955 + 0.022333 + 0.001888]

= 0.001689

(d) If this kernel K30 (t) has a bandwidth of 30, then the kernel function K90 (t) that has a bandwidth of 90 is

Kh (t) = K1 ╱ 、

,￥ + _90 < t < 0

K90 (t) = │ _ 0 < t < 90

￥『0 elsewhere

(e) The advantage of using the wider bandwidth kernel is that it includes more observations and

therefore will have a smaller variance.

(f) In order to estimate the standard deviation of this kernel estimator, I decided to try to use a

bootstrap methodology.

> sds <- rep(-999,8)

> for( j in 1:8) {

boot.wind <- matrix(sample(wind.speed,size=20000,replace=TRUE),ncol=20) kern.bw <- (1/30 - abs(boot.wind-300)/900)*(abs(boot.wind-300)<30) hat.f <- rowMeans(kern.bw)

sds[j] <- sd(hat.f)

+ }

> signif(sds,4)

[1] 0.001104 0.001158 0.001137 0.001154 0.001133 0.001200 0.001155 0.001171 > mean(sds)

[1] 0.001151571

> sd(sds)

[1] 2.817011e-05

Our bootstrap estimator of sd(fˆ) is 0.01151571 from the code.

An approximate margin of error for this estimator can be computed from the batches.

s 2.817 × 10 —5