Homework 3: STA465/2016
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Homework 3: STA465/2016
Homework 3 is due ≈ on Friday, March 17th (but take an extra day or two if you need it). The homework assignment is worth 25 points in total.
Question 1 (5 pts)
So far, we’ve learned about areal and geospatial data. To model areal data we’ve used Gaussian Markov random fields, and for modeling geospatial data we’ll use Gaussian random fields.
Question 1.1
Define a Gaussian Markov random field and describe construction of the precision matrix.
Question 1.2
Define a Gaussian random field and describe construction of the covariance matrix.
Question 1.3
Compare and contrast a Gaussian Markov random field and a Gaussian random field. What similarities do
they share and how do they differ?
Question 2 (20 pts)
For this question, we will use the malaria data from The Gambia that is available in the geoR R
package. We will follow the example given in https://www.paulamoraga.com/book-geospatial/sec- .geostatisticaldataexamplespatialhtml
## load in the R packages needed and data
library(geoR)
library(sf)
library(tidyverse)
library(sp)
library(rgdal)
library(tmap)
library(raster)
library(geodist)
library (INLA)
library (rSPDE)
data(gambia)
## aggregate by location and convert the object into a 'sf ' object
d <- group_by(gambia, x, y) %>%
summarize(
total = n (),
positive = sum (pos),
prev = positive / total
)
sps <- SpatialPoints(d[, c ("x" , "y")],
proj4string = CRS ("+proj=utm +zone=28")
)
spst <- spTransform(sps, CRS ("+proj=longlat +datum=WGS84"))
d[,c ("long" , "lat")] <- spst@coords
r <- getData(name = "alt" , country = "GMB" , mask = TRUE)
d$alt <- raster ::extract(r, d[, c ("long" , "lat")])
gambia .sf <- st_as_sf(d, coords = c ("long" , "lat"), crs = "+proj=longlat +datum=WGS84")
## map of the prevalence data
tm_shape(gambia .sf) + tm_dots("prev" , title = "Prevalence")
Prevalence
0.0 to 0.2 0.2 to 0.4 0.4 to 0.6
0.6 t 0.8 to 1.0
|
Conditional on the true prevalence P (si ) at location si , i = 1, . . . , n, the number of positive results Yi out of Ni people sampled follows a binomial distribution:
Yi |P(si ) ∼ Binomial (Ni , P (si ))
logit(P (si )) = β0 + β1 × altitude + f(si )
Here β0 denotes the intercept and β 1 is the coefficient of altitude. The spatial random effect f(si ) that follows a zero-mean Gaussian process with one of two covariance functions: exponential or Matérn.
Question 2.1
What is the geometry type and CRS of the The Gambia data set, gambia .sf? Display total and prevalence in a two panel plot.
Question 2.2
We will fit a model to the prevalence data of The Gambia, but first simulate from the prior predictive
distribution. For the covariance matrix we’ll need to compute the distance matrix:
dist .matrix <- geodist(d[,c ("long" , "lat")], measure = "geodesic")
Assuming a Matérn covariance matrix structure, use the following prior distributions to simulate 100 data sets from the prior predictive distribution:
• β0 ∼ N(0, 1), β 1 ∼ N(0, 1), κ ∼ Gamma(shape = 0.1, rate = 1), ν ∼ Gamma(10, 1), σ Gamma(1, 1)
# Some example code
# kappa = range; nu = shape; sigma = standard deviation
rSPDE::matern .covariance(h = dist .matrix, kappa = .001 , nu = 100 , sigma = 1)
For the first six locations in gambia .sf, make a histogram of the prior predictive draws of prevalence, P (si ), and add a vertical line at the observed value.
gambia .sf[1:6,]
## Simple feature collection with 6 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: - 16 .38755 ymin: 13 . 18541 xmax: - 16 .23041 ymax: 13 .53742 ## CRS: +proj=longlat +datum=WGS84
## # A tibble: 6 x 7
## # Groups: x [6]
## |
x y |
total |
positive |
prev |
alt geometry |
## |
<dbl> <dbl> |
<int> |
<dbl> |
<dbl> |
<dbl> <POINT [ °]> |
## 1 |
349631 . 1458055 |
33 |
17 |
0 .515 |
14 (- 16 .38755 13 . 18541) |
## 2 |
358543 . 1460112 |
63 |
19 |
0 .302 |
30 (- 16 .30543 13 .20444) |
## 3 |
360308 . 1460026 |
17 |
7 |
0 .412 |
32 (- 16 .28914 13 .20374) |
## 4 |
363796 . 1496919 |
24 |
8 |
0 .333 |
20 (- 16 .25869 13 .53742) |
## 5 |
366400 . 1460248 |
26 |
10 |
0 .385 |
28 (- 16 .23294 13 .20603) |
## 6 |
366688 . 1463002 |
18 |
7 |
0 .389 |
17 (- 16 .23041 13 .23094) |
Question 2.3
For the set of priors given in Question 1.2, create maps of 4 different prior predictive draws of prevalence in The Gambia.
Question 2.4
Setting priors in INLA is not as straightforward for these models as it is for the BYM model. In this case, we’ll fit three models using the default priors in INLA and compare the results.
• Exponential
• Matérn using SPDE2
• Matérn with penalized complexity priors
# Mesh construction
coo <- cbind(d$long, d$lat)
mesh <- inla .mesh .2d (
loc = coo, max .edge = c (0.1 , 5),
cutoff = 0.01
)
plot(mesh)
points(coo, col = "red")
# We ' ll change the spde depending on what we want to fit
#Exponential
spde <- inla .spde2 .matern(mesh = mesh, alpha = 3/2 , constr = TRUE)
# SPDE2 Matérn
spde <- inla .spde2 .matern(mesh = mesh, alpha = 2 , constr = TRUE)
# SPDE2 Matérn with penalized complexity priors
spde <- inla .spde2 .pcmatern(mesh = mesh, alpha = 2 , constr = TRUE ,
prior .range = c ( 10 , 0.9), prior .sigma = c ( 1 , 0.01))
indexs <- inla .spde .make .index("s", spde$n .spde)
A <- inla .spde .make .A (mesh = mesh, loc = coo)
## Prediction data
dp <- rasterToPoints (r)
dim(dp)
ra <- aggregate(r, fact = 5 , fun = mean)
dp <- rasterToPoints (ra)
coop <- dp[, c ("x" , "y")]
Ap <- inla .spde .make .A (mesh = mesh, loc = coop)
# stack for estimation stk . e
stk .e <- inla .stack(
tag = "est" ,
data = list(y = d$positive, numtrials = d$total),
A = list(1 , A),
effects = list(data .frame(b0 = 1 , altitude = d$alt), s = indexs)
)
# stack for prediction stk .p
stk .p <- inla .stack(
tag = "pred" ,
data = list(y = NA , numtrials = NA),
A = list(1 , Ap),
effects = list(data .frame(b0 = 1 , altitude = dp[, 3]),
s = indexs
)
)
# stk .full has stk . e and stk .p
stk .full <- inla .stack(stk .e, stk .p)
formula <- y ~ 0 + b0 + altitude + f(s, model = spde)
res <- inla(formula,
family = "binomial" , Ntrials = numtrials,
control.family = list(link = "logit"),
data = inla .stack .data(stk .full),
control .predictor = list(
compute = TRUE , link = 1 ,
A = inla .stack .A (stk .full)
), control .compute = list(config = TRUE),
control .inla = list(int .strategy = 'eb ')
)
Organize the results of the estimated parameters (mean and 95% credible intervals) in a table. Include R code used.
Question 2.5
Create three maps of the results for each fitted model.
• Map 1: Predicted prevalence of Malaria across The Gambia.
• Map 2: The lower limit of the 95% credible interval for predicted prevalence.
• Map 3: The upper limit of the 95% credible interval for predicted prevalence.
Question 2.6
Calculate the probability that Malaria prevalence is > 60% across The Gambia. Map the exceedence prob- abilities for each model.
2023-03-09