Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

2023 554 SUMMER Package R Notes

Departments of Biostatistics and Statistics

2023-02-16

Small Area Estimation (SAE)

In these notes, SAE via the SUMMER package will be illustrated.

Details on SUMMER, including a vignette, can be found at https://cran.r-project.org/web/packages/SUMME R/index.html.

We illustrate with the Washington State BRFSS diabetes example and will obtain:

Naive estimates

Weighted estimates

Estimates from a binomial BYM2 model

Estimates from Fay-Herriot models

Load SUMMER package

We rst load the SUMMER package.

This package also depends on INLA, so we need to make sure INLA is installed.  Note that INLA is not on CRAN so it has a special installation process. Here, we check if INLA is available and install it if it is not.  library (SUMMER)

if (!isTRUE (requireNamespace ( "INLA" ,  quietly  =  TRUE)))  {

install .packages("INLA" ,

repos  =  c (getOption("repos"),

INLA="https://inla.r-inla-download.org/R/stable "), dep=TRUE)

}

Read in Data

BRFSS contains the full BRFSS dataset with 16,283 observations:

diab2 variable is the binary indicator of Type II diabetes

strata is the strata indicator and

rwt_llcp is the nal weight.

For the purpose of this analysis, we rst remove records with missing HRA code or diabetes status from this dataset.

data (BRFSS)

BRFSS  <- subset (BRFSS, !is .na (BRFSS$diab2))

BRFSS  <- subset (BRFSS, !is .na (BRFSS$hracode))

KingCounty contains the map of the King County HRAs. In order to t spatial smoothing model, we rst need to compute the adjacency matrix for the HRAs, mat, and make sure both the column and row names

correspond to the HRA names.

library(sf) # Load sf for spatial analysis

library(prioritizr) # Allows us to create an adjacency matrix

data(KingCounty)

KingCounty  <- st_as_sf(KingCounty)

mat  <- adjacency_matrix(KingCounty)

colnames(mat)  <- rownames (mat) <- KingCounty$HRA2010v2_

mat  <- as .matrix(mat[1:dim(mat)[1], 1:dim(mat)[1]])

mat[1 :2 ,  1:2]

## Auburn-North Auburn-South

## Auburn-North 0 1

## Auburn-South 1 0

Direct weighted estimates using the survey package

We load the survey package and then define the survey object for the BRFSS data.  We have stratified, disproportionate sampling, so note the arguments:

weights

strata

We then calculate the direct (weighted) estimates. These are also known as the Horvitz-Thompson estimates. library(survey)

design  <- svydesign(ids = ~ 1 , weights = ~rwt_llcp, strata = ~strata,

data  =  BRFSS)

direct  <- svyby(~diab2, ~hracode, design, svymean)

head(direct,  n  =  7)

##

## Auburn -North

## Auburn -South

## Ballard

## Beacon/Gtown/S .Park

## Bear Creek/Carnation/Duvall Bear Creek/Carnation/Duvall 0 .05166773 0 .01190146

## Bellevue-Central

## Bellevue-NE

Bellevue-Central 0 .05914082 0 .01485885

Bellevue-NE 0 .05772789 0 .01509705

Binomial spatial smoothing model

We ignore the design and t the model:

yi Ipi      ~   Binomial(ni , pi )

θi      =   log = α + bi

with bi  following a BYM2 model, i.e., an iid normal random effect and an intrinsic CAR (ICAR) random effect.

The smoothSurvey function in the SUMMER package

The binomial smoothing model is t with the smoothSurvey function in the SUMMER package by specifying NULL for the survey characteristics, i.e. strata, weights, and cluster variables. For this example we are using a

BYM2 spatial effect, so we include the polygon information and the adjacency matrix in the geo and Amat arguments - this is required for the ICAR component.

smoothed  <- smoothSurvey(data = BRFSS, geo = KingCounty, Amat = mat,

responseType  =  "binary" ,  responseVar  =  "diab2" ,  strataVar  =  NULL ,

weightVar  =  NULL ,  regionVar  =  "hracode" ,  clusterVar  =  NULL ,

CI  =  0.95)

The usual INLA summaries can be found in smoothed$fit:

smoothed$fit$summary .fixed

## mean sd 0 .025quant 0 .5quant 0 .975quant mode

## (Intercept) -2 .353603 0 .03303589 -2 .419006 -2 .353481 -2 .288934 -2 .353278

## kld

## (Intercept) 1 .431232e -09

smoothed$fit$summary .hyper

## mean sd 0 .025quant 0 .5quant ## Precision for region .struct 15 . 1709723 4 .949984 7 . 7539676 14 .4056668 ## Phi for region .struct 0 .8558337 0 . 137880 0 .4888591 0 .9009513 ## 0 .975quant mode

## Precision for region .struct 27 .0305684 12 .9899024

## Phi for region .struct 0 .9955581 0 .9923834

Now examine some of the other components:

names (smoothed)

## [1] "HT" "smooth" "smooth .overall" "fit"

## [5] "CI" "Amat" "responseType" "formula"

## [9] "msg"

names (smoothed$HT)

## [1] "region" "HT .est" "HT .var" "HT .logit .est"

## [5] "HT .logit .var" "HT .logit .prec" "n" "y"

names (smoothed$smooth)

"var"

"logit .var"

head(smoothed$HT,  n  =  4)

## region HT .est

## 1 Auburn-North 0 . 14028777

## 2 Auburn-South 0 .23204420

## 3 Ballard 0 .06666667

## 4 Beacon/Gtown/S .Park 0 .08571429

## HT .logit .prec n y

## 1 33 .52878 278 39

## 2 32 .25414 181 42

## 3 34 .53333 555 37

## 4 16 .45714 210 18

The smoothed estimates of pi  and θi  can be found in the smooth object returned by the function, and the direct estimates are stored in the HT object (without specifying survey weights, these are the simple binomial probabilities, i.e. naive direct estimates).

head(smoothed$smooth,  n  =  1)

## region mean var median lower upper logit .mean

## 1 Auburn -North 0 . 1352512 0 .0002478172 0 . 1344345 0 . 1067145 0 . 1684375 - 1 .862334

## logit .var logit .median logit .lower logit .upper

## 1 0 .01796179 - 1 .862617 -2 . 124403 - 1 .598306


head(smoothed$HT,  n  =  1)

## region HT .est HT .var HT .logit .est HT .logit .var HT .logit .prec

## 1 Auburn -North 0 . 1402878 0 .0004338385 - 1 .812902 0 .02982513 33 .52878

## n y

## 1 278 39

We map the posterior mean estimates for the binomial smoothing model.

data(KingCounty)

toplot  <- smoothed$smooth

mapPlot (data  =  toplot,  geo  =  KingCounty,  variables  =  c ("mean"),

labels  =  c ( "Posterior  Mean"),  by .data  =  "region" ,  by .geo  =  "HRA2010v2_ ")


Posterior Mean

We map the naive direct estimates, which are available in the smoothSurvey t.

toplot$HTest  <- smoothed$HT$HT .est

mapPlot (data  =  toplot,  geo  =  KingCounty,  variables  =  c ( "HTest"),

labels  =  c ( "Naive  Direct  Estimates"),  by .data  =  "region" ,

by .geo  =  "HRA2010v2_ ")


Naive Direct Estimates

Now map the lower and upper endpoints of 95% CI for direct estimates.

lo  <- smoothed$HT$HT .est - 1.96 * sqrt(smoothed$HT$HT .var)

hi  <- smoothed$HT$HT .est + 1.96 * sqrt(smoothed$HT$HT .var)

toplot$HTlower  <- lo

toplot$HTupper  <- hi

mapPlot (data  =  toplot,  geo  =  KingCounty,  variables  =  c ( "HTlower" ,            "HTupper"),  labels  =  c ( "Naive  Direct  Lower" ,  "Naive  Direct Upper"),



by.data  =  "region" ,  by .geo  =  "HRA2010v2_ ")


Naive Direct Lower


Value



0.2


0.1


And also map the posterior coefficient of variation (CV) for the smoothed estimates, which we compute as 100 x sd(pi Iy)/匝[pi Iy], i.e. the posterior standard deviation relative to the posterior mean. The CV is often a more easily interpretable uncertainty measure than the posterior standard deviation.

toplot$cv  <- sqrt(smoothed$smooth$var)/smoothed$smooth$mean

mapPlot (data  =  toplot,  geo  =  KingCounty,  variables  =  c ("cv"),

labels  =  c ( "Posterior  CV,  Smoothed  Estimates"),  by .data  =  "region" ,

by .geo  =  "HRA2010v2_ ")


Posterior CV, Smoothed Estimates