闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

2023 554 SUMMER Package R Notes

Departments of Biostatistics and Statistics

2023-02-16

Small Area Estimation (SAE)

In these notes, SAE via the SUMMER package will be illustrated.

Details on SUMMER, including a vignette, can be found at https://cran.r-project.org/web/packages/SUMME R/index.html.

We illustrate with the Washington State BRFSS diabetes example and will obtain:

● Naive estimates

● Weighted estimates

● Estimates from a binomial BYM2 model

● Estimates from Fay-Herriot models

Load SUMMER package

We ﬁrst load the SUMMER package.

This package also depends on INLA, so we need to make sure INLA is installed. Note that INLA is not on CRAN so it has a special installation process. Here, we check if INLA is available and install it if it is not. library (SUMMER)

if (!isTRUE (requireNamespace ( "INLA" , quietly = TRUE))) {

install .packages("INLA" ,

repos = c (getOption("repos"),

INLA="https://inla.r-inla-download.org/R/stable "), dep=TRUE)

}

Read in Data

BRFSS contains the full BRFSS dataset with 16,283 observations:

● diab2 variable is the binary indicator of Type II diabetes

● strata is the strata indicator and

● rwt_llcp is the ﬁnal weight.

For the purpose of this analysis, we ﬁrst remove records with missing HRA code or diabetes status from this dataset.

data (BRFSS)

BRFSS <- subset (BRFSS, !is .na (BRFSS$diab2))

BRFSS <- subset (BRFSS, !is .na (BRFSS$hracode))

KingCounty contains the map of the King County HRAs. In order to ﬁt spatial smoothing model, we ﬁrst need to compute the adjacency matrix for the HRAs, mat, and make sure both the column and row names

correspond to the HRA names.

library(sf) # Load sf for spatial analysis

library(prioritizr) # Allows us to create an adjacency matrix

data(KingCounty)

KingCounty <- st_as_sf(KingCounty)

mat <- adjacency_matrix(KingCounty)

colnames(mat) <- rownames (mat) <- KingCounty$HRA2010v2_

mat <- as .matrix(mat[1:dim(mat)[1], 1:dim(mat)[1]])

mat[1 :2 , 1:2]

## Auburn-North Auburn-South

## Auburn-North 0 1

## Auburn-South 1 0

Direct weighted estimates using the survey package

We load the survey package and then deﬁne the survey object for the BRFSS data. We have stratiﬁed, disproportionate sampling, so note the arguments:

● weights

● strata

We then calculate the direct (weighted) estimates. These are also known as the Horvitz-Thompson estimates. library(survey)

design <- svydesign(ids = ~ 1 , weights = ~rwt_llcp, strata = ~strata,

data = BRFSS)

direct <- svyby(~diab2, ~hracode, design, svymean)

head(direct, n = 7)

## Auburn -North

## Auburn -South

## Ballard

## Beacon/Gtown/S .Park

## Bear Creek/Carnation/Duvall Bear Creek/Carnation/Duvall 0 .05166773 0 .01190146

## Bellevue-Central

## Bellevue-NE

Bellevue-Central 0 .05914082 0 .01485885

Bellevue-NE 0 .05772789 0 .01509705

Binomial spatial smoothing model

We ignore the design and ﬁt the model:

yi Ipi ~ Binomial(ni , pi )

θi = log ╱ 、 = α + bi

with bi following a BYM2 model, i.e., an iid normal random eﬀect and an intrinsic CAR (ICAR) random eﬀect.

The smoothSurvey function in the SUMMER package

The binomial smoothing model is ﬁt with the smoothSurvey function in the SUMMER package by specifying NULL for the survey characteristics, i.e. strata, weights, and cluster variables. For this example we are using a

BYM2 spatial eﬀect, so we include the polygon information and the adjacency matrix in the geo and Amat arguments - this is required for the ICAR component.

smoothed <- smoothSurvey(data = BRFSS, geo = KingCounty, Amat = mat,

responseType = "binary" , responseVar = "diab2" , strataVar = NULL ,

weightVar = NULL , regionVar = "hracode" , clusterVar = NULL ,

CI = 0.95)

The usual INLA summaries can be found in smoothed$fit:

smoothed$fit$summary .fixed

## mean sd 0 .025quant 0 .5quant 0 .975quant mode

## (Intercept) -2 .353603 0 .03303589 -2 .419006 -2 .353481 -2 .288934 -2 .353278

## kld

## (Intercept) 1 .431232e -09

smoothed$fit$summary .hyper

## mean sd 0 .025quant 0 .5quant ## Precision for region .struct 15 . 1709723 4 .949984 7 . 7539676 14 .4056668 ## Phi for region .struct 0 .8558337 0 . 137880 0 .4888591 0 .9009513 ## 0 .975quant mode

## Precision for region .struct 27 .0305684 12 .9899024

## Phi for region .struct 0 .9955581 0 .9923834

Now examine some of the other components:

names (smoothed)

## [1] "HT" "smooth" "smooth .overall" "fit"

## [5] "CI" "Amat" "responseType" "formula"

## [9] "msg"

names (smoothed$HT)

## [1] "region" "HT .est" "HT .var" "HT .logit .est"

## [5] "HT .logit .var" "HT .logit .prec" "n" "y"

names (smoothed$smooth)

"var"

"logit .var"

head(smoothed$HT, n = 4)

## region HT .est

## 1 Auburn-North 0 . 14028777

## 2 Auburn-South 0 .23204420

## 3 Ballard 0 .06666667

## 4 Beacon/Gtown/S .Park 0 .08571429

## HT .logit .prec n y

## 1 33 .52878 278 39

## 2 32 .25414 181 42

## 3 34 .53333 555 37

## 4 16 .45714 210 18

The smoothed estimates of pi and θi can be found in the smooth object returned by the function, and the direct estimates are stored in the HT object (without specifying survey weights, these are the simple binomial probabilities, i.e. naive direct estimates).

head(smoothed$smooth, n = 1)

## region mean var median lower upper logit .mean

## 1 Auburn -North 0 . 1352512 0 .0002478172 0 . 1344345 0 . 1067145 0 . 1684375 - 1 .862334

## logit .var logit .median logit .lower logit .upper

## 1 0 .01796179 - 1 .862617 -2 . 124403 - 1 .598306

head(smoothed$HT, n = 1)

## region HT .est HT .var HT .logit .est HT .logit .var HT .logit .prec

## 1 Auburn -North 0 . 1402878 0 .0004338385 - 1 .812902 0 .02982513 33 .52878

## n y

## 1 278 39

We map the posterior mean estimates for the binomial smoothing model.

data(KingCounty)

toplot <- smoothed$smooth

mapPlot (data = toplot, geo = KingCounty, variables = c ("mean"),

labels = c ( "Posterior Mean"), by .data = "region" , by .geo = "HRA2010v2_ ")

Posterior Mean

We map the naive direct estimates, which are available in the smoothSurvey ﬁt.

toplot$HTest <- smoothed$HT$HT .est

mapPlot (data = toplot, geo = KingCounty, variables = c ( "HTest"),

labels = c ( "Naive Direct Estimates"), by .data = "region" ,

by .geo = "HRA2010v2_ ")

Naive Direct Estimates

Now map the lower and upper endpoints of 95% CI for direct estimates.

lo <- smoothed$HT$HT .est - 1.96 * sqrt(smoothed$HT$HT .var)

hi <- smoothed$HT$HT .est + 1.96 * sqrt(smoothed$HT$HT .var)

toplot$HTlower <- lo

toplot$HTupper <- hi

mapPlot (data = toplot, geo = KingCounty, variables = c ( "HTlower" , "HTupper"), labels = c ( "Naive Direct Lower" , "Naive Direct Upper"),

by.data = "region" , by .geo = "HRA2010v2_ ")

Naive Direct Lower

Value

0.2

0.1

And also map the posterior coeﬃcient of variation (CV) for the smoothed estimates, which we compute as 100 x sd(pi Iy)/匝[pi Iy], i.e. the posterior standard deviation relative to the posterior mean. The CV is often a more easily interpretable uncertainty measure than the posterior standard deviation.

toplot$cv <- sqrt(smoothed$smooth$var)/smoothed$smooth$mean

mapPlot (data = toplot, geo = KingCounty, variables = c ("cv"),

labels = c ( "Posterior CV, Smoothed Estimates"), by .data = "region" ,

by .geo = "HRA2010v2_ ")

Posterior CV, Smoothed Estimates