闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH 3333 3.0 - Winter 2022-23

Assignment 2

Question 1: In maximum likelihood estimation, the estimator is obtained by maximizing the log likelihood function. However, most of the log likelihood has to be optimized by Newton-Raphson algorithm. In this question, we will learn to program Newton-Raphson algorithm for a univariate function. Consider the function f (θ) = , θ > 0. Use the Newton-Raphson method to ﬁnd the maximizer of the function. Implement the algorithms in R and run your code to obtain the maximizer. (Attach your screenshots of the output in R). You can use the online symbolic diﬀerentiation calculator to obtain f/ (θ) and f// (θ) (https://www.symbolab.com/solver/second-derivative-calculator).

gx<-function(x){

p<-(3*x^2-1)/(1+x^3)

return(p)

}

gp<-function(x){

p<-(6*x-3*x^4+3*x^2)/(1+x^3)^2

return(p)

}

gpp<-function(x){

p<-6*(x^6-2*x^4-7*x^3+x+1)/(1+x^3)^3

return(p)

}

x<-(1:1000)/100

y<-numeric(1000)

for (i in 1:1000){

y[i]<-gx(x[i])

}

plot(x,y,type="l")

###Newton:the starting is 1.5

diff<-4

iter<-0

x<-1.5

while ((diff>0.001) && (iter<30)) {

oldx<-x

x<-x-gp(x)/gpp(x)

diff<-abs(x-oldx)

iter<-iter+1

print(c(iter,diff))

}

gx(x)

[[1] 1.00000000 0.02065303

[1] 2.0000000000 0.0007258094

> gx(x)

[1] 1.314596

> x

[1] 1.521379

We choose the starting point to be 1.5 by examinig the plot of the function. The optimizer is x = 1.521379, and the maximum value is 1.314596.

Question 2: In the following marketing set, we have 9 years with the sales in 10 million

euro and the advertising expenditure in million euro.

Year Sales Advertisement

1 61 19

2 73 26

3 85 30

4 106 34

5 120 43

6 129 48

7 142 52

8 144 57

9 161 68

a) Based on the 9 observation and perform a ridge regression. Program it with R. Output the ridge regression results at a few diﬀerent values of λ.

xmat<-cbind(rep(1,9),c(19,26,30,34,43,48,52,57,68))

y<-c(61,73,85,106,120,129,142,144,161)

lambda=0.1

solve(t(xmat)%*%xmat+lambda*diag(2))%*%(t(xmat)%*%y)

[,1]

[1,] 21.910647

[2,] 2.179345

###now change lambda

lambda=100

solve(t(xmat)%*%xmat+lambda*diag(2))%*%(t(xmat)%*%y)

[1,] 0.2989559

[2,] 2.6217873

b) In your ridge regression, when λ increaes, what do you observe from the values of the estimated coeﬃcients. Does any of the estimated coeﬃcients shrink to zero like the L1 LASSO regression? Describe the diﬀerence between the output of a ridge regression and the output of a lasso regression.

The ridge regression penalizes the l2 norm of the estimates so the estimates have smaller L2 norm compared to least square estimates. On the other hand, the LASSO regression penalizes the l1 norm of the estimates so the estimates have smaller L1 norm compared to least square estimates. Unlike LASSO, even though the estimates will shrink towards zero, they will never be exactly zero. LASSO produces sparse estimates where some of the parameters are estiamted to be exactly zero.

Question 3:

a. In this question, we will investigate the problem of mulitple testing. Consider the hy- pothesis testing of H0 : µ = 0, vs Ha : µ 0. Under the null hypothesis, the Z test statistic is a standard normal random variable. We reject the null hypothesis when |Z| is greater than 1.96 at the signiﬁcance level of 0.05. Write a R program to simulate 1000 Z test statistic from standard normal N(0, 1). (You can use the sample codes in our lecture notes.)

ztests=rnorm(1000,0,1)

(Note: There are many other ways of coding it. You can also follow my lecture notes and generate one sample z test from simulated data sets.)

b. If we perform hypothesis testing using the signiﬁcance level of 0.05. Among the 1000 test statistics you generated, how many of them are rejected?

totalrejection<-sum(abs(ztests)>=1.96)

> totalrejection

[1] 42

c. Apply the Bonferroni method to control the overal type I error rate to be 0 .05. Based

on your program, how many rejections do you obtain from your simulation? To use the Bonferroni, we will have to use the cutoﬀ z0.05/(2*1000) .

bonferroni<-qnorm((1-0.05/(2*1000)),0,1)

totalrejection<-sum(abs(ztests)>=bonferroni)

> totalrejection

[1] 0

d. As the Zs are generated from the null hypothesis, we consider these rejections are all false positive discoveries. Please use a short paragraph to summarize the problem we are facing when we perform multiple testings.

As we can see that if we perform multiple testing, the Type I error rate will multiply. To control the overal Type I error rate against committing one error, we can use Bonferroni’s method. However, Bonferroni’s method is very stringent. It can be used to protect us from committing at least one error, it can diminish our power to detect true positives. Thus in real high dimensional application, people tend to control the overal false discovery rate instead of the overal Type I error rate.

Question 4: Consider a data set with the response vector Y = (y1 , . . . , yn )T and the data

matrix

╱ 1 x11 x12 、

．． .

We model the relationship between X and Y using the linear regression model: yi = θ0 + θ 1 xi1 + θ2 xi2 + ci , i = 1, . . . , n, where c ~ N (0, σ2 ). Let the parameter vector be denoted as θ = (θ0 , θ 1 , θ2 )T . We wish to minimize the sum of weighted squared residuals: SSE = wi (yi - (θ0 + θ1 xi1 + θ2 xi2))2 . Derive the formula for the solution of θ which minimizes

the weighted sum of squared errors.

We wish to minimize the weighted sum of squared residuals: SSE = wi (yi - (θ0 + θ 1 xi1 + θ2 xi2))2 , where wi is the weight for the ith observation. Let W be a diagonal matrix with w1 , . . . , wn be the diagonal entries.

SSE = (Y - )t W (Y - )

= (Y - Xθ)t W (Y - Xθ)

= Yt WY - θtXt WY - Yt WXθ + θtXt WXθ = Yt WY - 2Yt WXθ + θtXt WXθ

(1)

Derivatives with respect to a vector follow the following rules which are very similar to the diﬀerentiation rule for one variable. Let X be a vector of variables of length n. Let Y be a vector of constants of length n. Let A be a constant matrix of dimension n by n.

= 2X. = Y. = 2AX.

∂SSE

∂θ

= ∂(Yt WY - 2Yt WXθ + θtXt WXθ)/∂θ

= -2Xt WY + 2(Xt WX)θ

∂SSE

0 =

∂θ

0 = ∂(Yt WY - 2Yt WXθ + θtXt WXθ)/∂θ

0 = -2Xt WY + 2(Xt WX)θ

θ = (Xt WX)-Xt WY

(2)

(3)

The solution is denoted as θˆ = (Xt WX)-Xt WY. This is the weighted least square solution to the regression model.

Question 5: Analyze the German data set from the site:

https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data).

a) Perform the logistic regression on the dataset. Build a predictive model using some of the predictors in the model. Please use 900 observations as the training set and use your model to predict the default status of the remaining 100 loans. Choose one of the regression coeﬃcents and interpret the regression coeﬃcient. What is the cutoﬀ value of the probability do you use for your analysis? How many default ones are predicted to be non-default ones (number of false negative)? How many non-default ones are predicted to be default ones (number of false positive). Then you need to improve your model by adding more predictors or adding some higher order terms or interaction terms. Please demonstrate that your new model has lesser errors than your ﬁrst model in the 100 testing cases.

index<-c(1,2,4,5,7,8,10,11,13,15,16,18,20,21)

for( i in index){

germancredit[,i]<-factor(germancredit[,i])

}

model<-glm(Default~duration+amount, data=germancredit[1:900,],family=binomial(link=logit > summary(model)

Call:

glm(formula = Default ~ duration + amount, family = binomial(link = logit),

data = germancredit[1:900, ])

Deviance Residuals:

Min 1Q Median 3Q Max

-1.4431 -0.8599 -0.7219 1.3020 1.8121

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.636e+00 1.535e-01 -10.659 < 2e-16 ***

duration 3.399e-02 7.771e-03 4.374 1.22e-05 ***

amount 1.295e-05 3.301e-05 0.392 0.695

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1096.1 Residual deviance: 1059.5 AIC: 1065.5

on 899 on 897

degrees of freedom

Number of Fisher Scoring iterations: 4

####interpretation of the regression coefficients

###

exp( 1.295e-05)

[1] 1.000013

This means if the loan amount increases by 1 dollar, the odds of defaulting the mortageg

model1<-glm(Default~., data=germancredit[1:900,],family=

binomial(link=logit))

pred<-predict(model,newdata=germancredit[901:1000,],type="response")

predstatus<-pred>=1/2

tp<-sum((predstatus==1)*(germancredit[901:1000,]$Default==1))

tn<-sum((predstatus==0)*(germancredit[901:1000,]$Default==0))

fp<-sum((predstatus==1)*(germancredit[901:1000,]$Default==0))

fn<-sum((predstatus==0)*(germancredit[901:1000,]$Default==1))

> tp

[1] 5

> tn

[1] 66

> fp

[1] 2

> fn

[1] 27

###the number of false negative is 27 and the number of false positive is 2.

misrate<-(fp+fn)/100

sensitivity<-tp/(tp+fn)

specificity<-tn/(tn+fp)

########

> misrate

[1] 0.29

> sensitivity

[1] 0.15625

> specificity

[1] 0.9705882

#######

pred<-predict(model1,newdata=germancredit[901:1000,],type="response")

predstatus<-pred>=1/2

tp<-sum((predstatus==1)*(germancredit[901:1000,]$Default==1)) tn<-sum((predstatus==0)*(germancredit[901:1000,]$Default==0)) fp<-sum((predstatus==1)*(germancredit[901:1000,]$Default==0)) fn<-sum((predstatus==0)*(germancredit[901:1000,]$Default==1)) misrate<-(fp+fn)/100

sensitivity<-tp/(tp+fn)

specificity<-tn/(tn+fp)

######

> misrate

[1] 0.26

> sensitivity

[1] 0.5625

> specificity

[1] 0.8235294

######We note that the second model has better performance than the first model. ##If we change the cutoff to 1/6

predstatus<-pred>=1/6

tp<-sum((predstatus==1)*(germancredit[901:1000,]$Default==1))