Inference for beta
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Inference for beta
2022
In this R illustration, we will explore the various inference techniques for beta (i.e., standard errors, confidence intervals, and hypothesis tests)
To this end, we will use the beetles data set, recording the number of dead beetles out of n at various concentrations of an insecticide (logdose)
beetles <- read.table("../Datasets/Beetles2.dat" ,header=TRUE)
attach(beetles)
beetles
## ## 1 ## 2 ## 3 ## 4 ## 5 ## 6 ## 7 ## 8 |
logdose 1.691 1.724 1.755 1.784 1.811 1.837 1.861 1.884 |
n 59 60 62 56 63 59 62 60 |
dead 6 13 18 28 52 53 61 60 |
We will first fit a binomial GLM as before, but record the model matrix in the output of the glm function. logitmod <- glm(cbind(dead,n-dead)~logdose,family=binomial,x=TRUE)
summary (logitmod)
##
## Call:
## glm(formula = cbind(dead, n - dead) ~ logdose, family = binomial,
## x = TRUE)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5878 -0.4085 0.8442 1.2455 1.5860
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -60.740 5.182 -11.72 <2e-16 ***
## logdose 34.286 2.913 11.77 <2e-16 ***
## ---
## Signif. codes: 0 !*** ! 0.001 !** ! 0.01 !* ! 0.05 ! . ! 0.1 ! ! 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 284.202 on 7 degrees of freedom
## Residual deviance: 11.116 on 6 degrees of freedom
## AIC: 41.314
##
## Number of Fisher Scoring iterations: 4
To calculate the standard errors by hand, we can calculate the Fisher information matrix and invert it, as follows.
t(logitmod$x)
## 1 2 3 4 5 6 7 8
## (Intercept) 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## logdose 1.691 1.724 1.755 1.784 1.811 1.837 1.861 1.884
## attr(,"assign")
## [1] 0 1
I <- t(logitmod$x)%*%diag(logitmod$weights)%*%logitmod$x
I
## (Intercept) logdose
## (Intercept) 58.46864 103.9678
## logdose 103.96781 184.9914
I.inv <- solve (I)
I.inv
## (Intercept) logdose
## (Intercept) 26.85178 -15.09108
## logdose -15.09108 8.48681
The standard errors are then the square roots of the diagonal elements of I.inv:
sd <- sqrt(diag(I.inv))
sd
## (Intercept) ## 5.181870
logdose
2.913213
which we can of course compare to the output of the glm function:
#summary(logitmod)$coefficients[,2] #another way of doing this if you don't have the arm library
library(arm)
se.coef(logitmod)
## (Intercept) ## 5.181870
logdose
2.913213
Now we can test whether each beta is 0 using Wald tests. Alternatively, you can use pnorm to calculate the p-values.
beta <- logitmod$coefficients
beta/sd #Wald statistics
## (Intercept) ## -11.72166
logdose
11.76911
p.val <- pchisq((beta/sd)^2 ,df=1 ,lower.tail=FALSE)
p.val #p-values
## (Intercept) logdose
## 9.871561e-32 5.631446e-32
and again compare it to the Wald statistics and p-values in the output of the glm function:
summary (logitmod)$coefficients
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -60.74013 5.181870 -11.72166 9.871561e-32
## logdose 34.28593 2.913213 11.76911 5.631446e-32
We can also calculate Wald confidence intervals for beta_1 and beta_2
z <- qnorm (0.975)
c.upper <- beta+z*sd
c.lower <- beta-z*sd
CI <- cbind(c.lower,c.upper)
colnames (CI)<-c ( "2.5 %" , "97.5 %")
CI
## 2.5 % 97.5 %
## (Intercept) -70.89641 -50.58386
## logdose 28.57614 39.99572
How about testing whether several betas are equal to zero at the same time using a Wald test? To make it more interesting, we will fit a model with logdose and the square of the logdose as a predictor.
logdose.sq <- (logdose)^2
logitmod2 <- glm(cbind(dead,n-dead)~logdose+logdose.sq,family=binomial,x=TRUE) summary(logitmod2)
##
## Call:
## glm(formula = cbind(dead, n - dead) ~ logdose + logdose.sq, family = binomial,
## x = TRUE)
##
## Deviance Residuals:
## 1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
## -0.4208 ## |
0.8177 |
-0.2843 |
-0.5269 |
0.8966 |
-0.8433 |
0.1716 |
0.7229 |
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 426.39 181.22 2.353 0.0186 *
## logdose -515.30 205.15 -2.512 0.0120 *
## logdose.sq 154.92 58.04 2.669 0.0076 **
## ---
## Signif. codes: 0 !*** ! 0.001 !** ! 0.01 !* ! 0.05 ! . ! 0.1 ! ! 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 284.2024 on 7 degrees of freedom
## Residual deviance: 3.2711 on 5 degrees of freedom
## AIC: 35.469
##
## Number of Fisher Scoring iterations: 4
Here is how we can test jointly that the coefficients of logdose and logdose.sq are both zero using a Wald test. We will first calculate the covariance matrix of (βˆ2 , βˆ3 ) like this:
I <- t(logitmod2$x)%*%diag(logitmod2$weights)%*%logitmod2$x
I
## (Intercept) logdose logdose.sq
## (Intercept) ## logdose ## logdose.sq
58.14448 102.8665 102.86654 182.0982 182.09818 322.5535
182.0982
322.5535
571.6912
I.inv <- solve (I)
I.inv
## (Intercept) logdose
## (Intercept) 32839.69 -37172.67
## logdose -37172.67 42086.42
## logdose.sq 10512.89 -11905.11
I.inv0 <- I.inv[2:3 ,2:3]
I.inv0
logdose.sq
10512.893
-11905.109
3368.352
## logdose logdose.sq
## logdose 42086.42 -11905.109
## logdose.sq -11905.11 3368.352
I0 <- solve (I.inv0)
I0
## logdose logdose.sq
## logdose 0.1114065 0.3937552
## logdose.sq 0.3937552 1.3919859
Then we will check how many degrees of freedom I0 has (here it is clear that dof must be 2 because I.inv0 is invertible, so you can skip this if you don’t have the Matrix library:
library (Matrix)
dof <- as.numeric(rankMatrix(I.inv0))
dof
## [1] 2
Now we are ready to calculate the Wald statistic and the p-value:
beta <- coef(logitmod2)[2:3]
W <- t(matrix(beta,nrow=2))%*%I0%*%matrix(beta,nrow=2) #Wald statistic
W
## [,1]
## [1,] 122.912
pchisq (W,df=dof,lower.tail=FALSE) #p-value
## [,1]
## [1,] 2.041719e-27
Extra material that is not part of the R sheet for week 5
We can also estimate the probability at a given logdose from the original model logitmod, say 1.8.
eta <- coef(logitmod)[1] + 1.8*coef(logitmod)[2]
mu <- exp (eta)/(1+exp (eta))
mu
## (Intercept)
## 0.7260234
and calculate the confidence interval, like this:
# inverse Fisher information matrix of the original model:
I <- t(logitmod$x)%*%diag(logitmod$weights)%*%logitmod$x
I
## (Intercept) logdose
## (Intercept) 58.46864 103.9678
## logdose 103.96781 184.9914
I.inv <- solve (I)
I.inv
## (Intercept) logdose
## (Intercept) 26.85178 -15.09108
## logdose -15.09108 8.48681
X <- matrix(c ( 1 , 1.8),nrow=1)
#confidence interval for eta
se.eta <- X%*%I.inv%*%t(X)
se.eta
## [,1]
## [1,] 0.02114354
ci.eta.upper <- eta + qnorm (0.975)*se.eta
ci.eta.lower <- eta - qnorm (0.975)*se.eta
c (ci.eta.lower,ci.eta.upper)
## [1] 0.9330987 1.0159799
eta
## (Intercept)
## 0.9745393
# confidence interval for mu
c (invlogit(ci.eta.lower),invlogit(ci.eta.upper))
## [1] 0.7177035 0.7341888
mu
## (Intercept)
## 0.7260234
2022-02-23