闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT4027: Advanced Statistical Modelling

Semester 2, 2021

Answer ALL questions.

1. A random variable Y follows the distribution with probability density function (pdf) f (y) = e_ [(y _τ )+e- (g -T ) , y ← R, τ > 0. (1)

The mean and variance of Y are respectively given by

E(Y) = τ + γ (γ = 0.5772) and Var(Y) =

6 .

(a) Show that f (y) can be written as

f (y) = exp ┌ ~~T (y~~一φ)~~b(θ)~~ + c(φ, y), .

Identify θ , b(θ), a(φ), c(φ, y) and the suﬃcient statistics T (y).

(b) Find E(T (Y)) = b\ (θ) and Var(T (Y)) = a(φ)b\\ (θ).

(i) Write down the loglikelihood function a(τ ; z) = z ln f (yi ) and evaluate its ﬁrst i=1

and second order derivative functions, a\ (τ ; z) and a\\ (τ ; z). (ii) Derive the maximum likelihood (ML) estimate and the variance of .

(iii) Show that in (ii) agrees with the result E(T (Y)) = b\ (θ) in (b).

1 n 1 n

(iv) Find an expression of AIC in terms of = z yi and = z zi , where

zi = e_yi .

(d) Using the results in (2),

(i) Derive the method of moment (MOM) estimate and the variance of . (ii) Show that the MOM estimate is less eﬃcient than the ML estimate in (c)(ii).

2. The data dat contain 15 patients under two diﬀerent treatments.

Patient	Survival or censor	Outcome	Treatment	Age (years)
i	time (days) ti	wi	ci	xi
1	1	Died	1	75
2	1	Died	1	79
3	4	Died	1	85
4	5	Died	1	76
5	6	Censored	1	66
6	8	Died	0	75
7	9	Censored	1	72
8	9	Died	1	70
9	12	Died	0	71
10	15	Censored	0	73
11	22	Died	1	66
12	25	Censored	0	73
13	37	Died	0	68
14	55	Died	0	59
15	72	Censored	0	61

Vectors t, w (1 for died; 0 for censored), treatment and age store the values of ourvivrl or 6ensor time, Оut6ome, srertment and Age respectively.

(a) The ﬁrst study investigates the overall survival rate. The cumulative hazard is

plotted in the following ﬁgure.

> Sfit=survfit(Surv(t,w)~1,data=dat)

> ggsurvplot(Sfit,conf.int = TRUE ,ggtheme=theme_bw(),fun="cumhaz")

(i) The vector of observed cumulative hazard H is calculated in the hidden procedure ‘???????’ using the observed survival rate S from survfit. The value of y[2] is also hidden by ‘*******’ . Calculate the hidden value.

> S=summary(Sfit)$surv; round(S,4)

[1] 0 .8667 0 .8000 0 .7333 0 .6600 0 .5867 0 .5029 0 .4023 0 .2682 0 .1341

> H=???????; round(H,4) #calculate obs cum hazard using S

[1] 0 .1431 ******* 0 .3102 0 .4155 0 .5333 0 .6874 0 .9106 1 .3161 2 .0092

(ii) From the above cumulative hazard plot, there seems to be two linear relation-

ships. Hence a change point model is ﬁtted to the observed cumulative hazard H against time which is outputed as below.

> time=summary(Sfit)$time

> Hlm=lm(H~time)

> library("segmented")

> Hfit=segmented(Hlm,~time) #1 change pt; 2 lines

> round(summary(Hfit)$coef[1:3,c(1,4)],4)

Estimate Pr(>|t|)

(Intercept) 0 .0715 0 .2727

time 0 .0469 0 .0043

U1 .time -0 .0162 0 .1555

> round(summary(Hfit)$psi,4)

Initial Est . St .Err

psi1 .time 9 11 .7511 5 .0535

Write down the hazard function h(t) showing all the necessary steps.

(iii) Provide two reasons why one can approximate the ﬁrst intercept 0.0715 of Hfit

to be zero. Write down the cumulative hazard function H(t) showing all neces- sary steps.

The plot of the observed and ﬁtted cumulative hazard functions is provided below.

Fitted and observed cumulative hazard

time

(b) Assume that the survival time ti follows Loglogisti6 yLLA 户isk( distribution:

αλ(λt)α _ 1 (λt)α

ti ~ LL(α, λ) with pdf fLL (t) = [1 + (λt)α )]2 and cdf FLL (t) = 1 + (λt)α .

(i) Derive the baseline hazard hLL ,0 (t) and cumulative hazard HLL ,0 (t) functions without any regression model.

(ii) It is know that if T follows the LL distribution, Y = log T follows the Logistic

(L) distribution with cdf

FL (y|µ, σ2 ) = exp( )

where σ is the scale parameter, µ = bT b is the mean function, b is a vector of covariates and b is a vector of regression parameters.

Derive the survival function SLL (t) for the LL model in terms of µ and σ starting with

SLL (t) = Pr(T > t) = Pr(log T > log t) = 1 一 FL (log t) = ... (3)

(iii) Let µ = b0 + b1 x. Using the result that

exp ╱ ~~log~~ σ(t)一 µ 、= t exp(一b0 ) exp(一b1 x/σ) = (λt)α exp(β1 x)

where α = , λ = exp(一b0 ), λα = exp(一b0 /σ) and β1 = 一b1 /σ ,

derive HLL (t) using SLL (t) in (3) and hLL (t) using HL (t) with the regression model µ = b0 + b1 x.

Hence explain that the LL model is not a proportional hazard model, that is, HLL (t) HLL ,0 (t) exp(β1 x) and hLL (t) hLL ,0 (t) exp(β1 x)

using the results in (b)(i).

(iv) Explain if the Poisson proxy model wi ~ Poisson(µp,i), µp,i = HLL (ti ) can

be applied to estimate the parameters α , β0 and β1 . Discuss any diﬃculties associated with this method and suggest a workable method.

> summary(survreg(Surv(t,w)~treatmemt+age,dist="loglogistic"))

Call:

survreg(formula = Surv(t, w) ~ treatment + age, dist = "loglogistic") Value Std . Error z p

(Intercept) 11 .9044

treatment age Log(scale)

-0 .9258 -0 .1239

-0 .7973

2 .6594 4 .48 7 .6e-06

0 .4677 -1 .98 0 .0478

0 .0385 -3 .22 0 .0013

0 .2645 -3 .01 0 .0026

Scale= 0 .451

Log logistic distribution

Loglik(model)= -35 .4 Loglik(intercept only)= -42 .9

Chisq= 14 .99 on 2 degrees of freedom, p= 0 .00056 Number of Newton-Raphson Iterations: 6

n= 15

(i) Calculate the mean µ for the patient with trertment and age b). Using the survival function SLL (t) in (3) that you have derived, evaluate the survival rate for this patient at the time t = 50.

(ii) The following plot provides the survival functions for age 65, 75 and 85 and with

and without treatment. State and explain brieﬂy the age and treatment status for the blue dash line.

Survival functions

	●

survival time (days)

3. An automobile accident data contains n = 6274 injuries o。served over four factors:

1. Gender (G):

2. Location (L):

3. Seat belt (S):

4. Injury (I):

Female (F) or Male (M) (denoted by αi in the model formula);

Urban (U) or Rural (R) (denoted by βj in the model formula);

No (N) or Yes (Y) (denoted by γk in the model formula);

0 not hospitalised; 1 hospitalised but did not die; 2 died (denoted by δl in the model formula).

Two Poisson loglinear models, Model A (mA) and Model B (mB) given below:

Model A: ln(µijkl ) = µ + αi + βj + γk + δl + (αγ)ik + (αδ)il + (γδ)kl + (βδ)jl + (αγδ)ikl (4)

Model B: ln(µijkl ) = µ + αi + βj + γk + δl + (αγ)ik + (αδ)il + (γδ)kl + (βγ)jk (5)

are applied where i, j, k = 0, 1 and l = 0, 1, 2. The baseline categories are “female”, “urban”, “no” seat belt and “0” level of injury with i, j, k, l = 0.

(a) For each of Model A and B, explain brieﬂy if it is de6omposr。le using suﬃcient con-

ﬁgurations and if it is decomposable, describe the dependen6y stru6ture and express pijkl , where pijkl 全 µijkl /n, in terms of some marginal probabilities.

(b) The parameter estimates for Model A are given by

> mA=glm(y~G*S*I+L*I,family=poisson) #y,G,S,I,L are 24x1 vectors

> pA=mA$coef; round(pA,3)

(Intercept) G1 S1 I1 I2 6 .7177 -0 .0774 -0 .2094 -2 .2433 -4 .4349

L1 G1:S1

0 .0287 -0 .4935

G1:I1 G1:I2 S1:I1 S1:I2 I1:L1 I2:L1 G1:S1:I1 G1:S1:I2

0 .2049 0 .4414 -0 .4445 -0 .2853 0 .5861 1 .1287 0 .2080 -0 .5244

A 。inomirl model with logit link using Model A and the sert 。elt (S; γk ) factor as the out6ome variable is applied. Parameters at baseline levels are set to 0.

pij1l

bols of the loglinear Model A in (4) with suitable subscripts and adding super- scripts “*” to the parameters to distinguish them from those in the loglinear model. Hint: look for the terms that involve γ ﬁrst.

(ii) The logit model in (i) includes the term

α 1(*) = ln p1010 /p1000

(6)

1010

1000

(i = 0). State which term in the loglinear Model A in (4) is deﬁned in the p0000

same way as (6). Hence provide the estimated value of α 1(*) .

(iii) Similarly, a higher order interaction term in the logit model in (i) also equals to a term in the loglinear Model A in (4). Write down the symbol of this higher order interaction term in the logit model and its estimated value. Interpret the parameter value of this higher order interaction term.

pijkl A A A

where i, j, k = 0, 1, l = 0, 1, 2 is applied and the parameters δl(A), αil(A), γkl(A) are 0 for the baseline level if any subscripts i, k, l are 0. The parameter estimates are given below:

> library(nnet)

> nB=multinom(ym~G+S) #ym is a 8x3 matrix; G,S are 8x1 factor vectors > round(summary(nB)$coefficients,4)

(Intercept) G1 S1

y1 -1 .9389 0 .2704 -0 .3471

y2 -3 .6320 0 .3025 -0 .5160

For an accident with a mrle driver (G; i = 1) in rny (L; j) area not wearing a seat belt (S; k = 0), estimate the probabilities for all injury levels (I; l = 0, 1, 2) as follow.

(i) Calculate the probability ratios and