闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COURSEWORK 3 (OF 4) FOR MATH69531 GENERAL INSURANCE 2022/2023

1. For any two sequences of numbers a1 , a2 , . . . , an and b1 , b2 , . . . , bn we denote

n n n

sab := (ai − )(bi − ) ∈ R, saa := (ai − )2 ≥ 0 and sbb := (bi − )2 ≥ 0,

where as usual and denote the (sample) means. Provided that saa > 0 and sbb > 0 (i.e. the sequences do not consist of all identical numbers), the sample correlation coeﬃcient is deﬁned as:

rab = sab ∈ [ − 1, 1].

You probably remember that rab reﬂects how strong the linear relationship between both se-

quences is. A value close to − 1 or 1 suggests a strong linear relationship, and the extreme case |rab | = 1 occurs for instance if c 0 and d exist so that ai = cbi + d for all i = 1, . . . , n. On the other hand, a value close to 0 (resp. equal to 0) suggests hardly any (resp. no) linear relationship (this does not necessarily mean there can’t be other types of relationships of course).

***

Now consider a set of n points in R3 denoted

(z1 , w1 , y1 ), (z2 , w2 , y2 ), . . . , (zn , wn , yn )

so that szz > 0, sww > 0 and |rzw | 1. We assume that these n points are observations from a Linear Model of the form

Yi = β0 + β1 zi + β2wi + εi for i = 1, . . . , n, (1)

where β0 , β1 , β2 are unknown parameters and as usual the εi ’s are iid zero mean random variables with common (unknown) variance σ 2 .

Note that we may equivalently write (1) as

Yi = α + β1 (zi − ) + β2 (wi − ) + εi for i = 1, . . . , n, (2)

provided we set α := β0 +β1 +β2 . This is an equally valid Linear Model representation of the data, with unknown parameters α, β1 , β2 and predictors (in our standard notation) xi1 = 1, xi2 = zi − and xi3 = wi − for i = 1, . . . , n. Representation (2) turns out to be easier to work with and is hence recommended to answer below questions with.

(a) Write down the design matrix for the model (2).

[2 marks]

(b) Denote by 1 and 2 the Least Square Estimators of β1 and β2 respectively. Show that

szz − sz(2)w /sww sww − sz(2)w /szz .

[7 marks]

[This question is continued on the next page]

For convenience we give the model a bit more context: suppose that the response y models the amount of time it takes a student to complete this coursework, and that the response depends on X ∈ [0, 1] (the fraction of their time the student has spent working on this material during the past three weeks) and w ∈ [0, 1] (a measure for how much they like statistics). Suppose that you are the person conducting this experiment, and that you have a very large pool of students with a large variety in X and w values available to make your n measurements (X1 , w1 , y1 ), . . . , (Xn , wn , yn ) from. (For clarity: the assumptions on the errors ei listed under (1) remain in force).

students so that Tzw is close to − 1 or 1.

[6 marks]

(d) Now give your best reason in words, without using (too much) maths and without using the variances computed in part (b), why it is not a good idea to choose your students so that Tzw is close to − 1 or 1.

[4 marks]

2. On our course Blackboard page, in the folder All things coursework you can ﬁnd the ﬁle Car data CW3 .txt. Download this ﬁle to your computer. It contains data related to 1000 car accidents collected by an insurance company for a speciﬁc type of car. For each accident the following has been recorded:

Name	Description
MarketValue	value of the car at the time of the accident
Speed	the speed of the car at the time of the accident
SeatbeltIndicator	is 0 if the driver did not have their seatbelt fastened and 1 if they did
DamageAmount	the total amount the insurance company had to pay out, for damage to the car and (possibly) personal injury of the driver

We are looking to ﬁt a Linear Model to this data, where DamageAmount is the response and the other bits of recorded data can be used to form the predictors. We do this using R, cf. Chapter 11 in the notes.

You are asked to work through the following steps.

1. Write down a ﬁrst (sensible) guess for a Linear Model for the given data.

2. Use R to ﬁt the Linear Model you have chosen to the data.

3. Discuss what the (relevant) output from R tells you about how good (or bad) the Linear Model you have chosen ﬁts the data.

4. Try to come up with a Linear Model that you expect to provide a better ﬁt to the data. Explain why you think the new model will give a better ﬁt. Then work through points 2. and 3. again for this new model, in 3. also explaining how your second model compares to your ﬁrst model.

You are very welcome to try more than two models (i.e. to execute step 4. more than once) if you are enjoying yourself, but this is not required.

Note: your answer to this question is not required to contain the ‘most perfect’ model for this data (for as far as that exists!). Rather full marks are given if you execute the above 4 steps fully, sensibly and correctly in your answer, for which in particular you should include both the R code you produced with the (relevant) output, and you should provide the discussion/motivation steps 3. and 4. are asking for. As in previous courseworks, just handwritten working in which you refer to R code/R output printed on separate sheets (attached to your solutions of course) is perfectly ﬁne.

[18 marks]

2022-12-09

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple