Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

StatComp Project 2: Scottish weather

Two main template files are available; modify these, but keep their names:

• a template RMarkdown document, report .Rmd, that you should expand with your answers and solutions for this project assignment,

• a template functions .R file where you should place function definitions needed by report .Rmd, with associated documentation.

Instructions:

1.  Code that displays results (e.g., as tables or figures) must be placed as code chunks in report .Rmd.

2.  Code in the template files includes an example of how to save and load results of long-running calculations

3.  Code chunks are included in report .Rmd that load the function definitions from functions .R (your own function definitions).

4.  Appendix code chunks are included that display the code from functions .R without running it.

5. Use the styler package Addin for restyling code for better and consistent readability works for both .R and .Rmd files.

6. As in project 1, use echo=TRUE for analysis code chunks and echo=FALSE for table display and plot- generating code chunks.

Make sure you include your name and student number in both report .Rmd and functions .R. When submitting, include a generated html version of the report with the file name report .html.

The project will be marked as a whole, between 0 and 40, using the marking guide posted on Learn.

Scottish weather data

The Global Historical Climatology Network at https://www.ncei.noaa.gov/products/land-based-station/glob alhistoricalclimatologynetworkdaily provides historical weather data collected from all over the globe. A subset of the daily resolution data set is available in the StatCompLab  package containing data from eight weather stations in Scotland, covering the time period from 1 January 1960 to 31 December 2018. Some of the measurements are missing, either due to instrument problems or data collection issues. See Tutorial07 for an exploratory introduction to the data, and techniques for wrangling the data.

Load the data with

data(ghcnd_stations,  package  =  "StatCompLab")

data(ghcnd_values,  package  =  "StatCompLab")

The ghcnd_stations data frame has 5 variables:

•  ID: The identifier code for each station

•  Name:  The humanly readable station name

•  Latitude:  The latitude of the station location, in degrees

•  Longitude:  The longitude of the station location, in degrees

Elevation: The station elevation, in metres above sea level

The station data set is small enough that you can view the whole thing, e.g. with knitr::kable(ghcnd_stations). You can try to find some of the locations on a map (Google maps and other online map systems can usually      interpret latitude and longitude searches).

The ghcnd_values data frame has 7 variables:

•  ID: The station identifier code for each observation

•  Year:  The year the value was measured

•  Month:  The month the value was measured

•  Day:  The day of the month the value was measured

• DecYear: “Decimal year”, the measurement date converted to a fractional value, where whole numbers correspond to 1 January, and fractional values correspond to later dates within the year. This is useful for both plotting and modelling.

• Element:  One of TMIN” (minimum temperature), “TMAX” (maximum temperature), or PRCP” (precipitation), indicating what each value in the Value variable represents

• Value: Daily measured temperature (in degrees Celsius) or precipitation (in mm)

Seasonal variability

Note: For this first half of the assignment, you can essentially ignore that some of the data are missing; when

defining averages, just take the average of whatever observations are available in the relevant data subset. If you construct averages with group_by() and summarise() & mean(), That will happen by itself.

Is precipitation seasonally varying?

The daily precipitation data is more challenging to model than temperature because of the mix of zeros and positive values. Plot temperature and precipitation data to show the behaviour, e.g., during one of the years, for all the stations.

• From the plots, is there a seasonal effect for temperature (“TMIN” and “TMAX”) and for precipitation (“PRCP”)? e.g., is it colder in the winter than in the summer?

Let winter be {Jan, Feb, Mar, Oct, Nov, Dec}, and let summer be {Apr, May, Jun, Jul, Aug, Sep}.  For easier code structure, add this season information to the weather data object.

Construct a Monte Carlo permutation test for the hypotheses:

H0  :The rainfall distribution is the same in winter as in summer

H1  :The winter and summer distributions have different expected values

Use T = |winter average − summer average| as a test statistic and add a Summer column to the data that is TRUE for data in the defined summer months. Compute separate p-values for each weather station and their respective Monte Carlo standard deviations. Construct a function p_value_CI to construct the interval.

Collect the results into a suitable data structure, present and discuss the results.

See Project2Hints for hints on constructing an approximate 95% confidence interval for a p-value when most observed counts are zero.

Spatial weather prediction

For this second half of the project, you should first construct a version of the data set with a new variable,

Value_sqrt_avg, defined as the square root of the monthly averaged precipitation values.  As noted in the

Project2Hints document, the precipitation values are very skewed, with variance increasing with the mean

value. Taking the square root of the monthly averages helps alleviate that issue, making a constant-variance model more plausible.

Estimation and prediction

Here, you will define and estimate models for the square root of the monthly averaged precipitation values in Scotland.

As a basic model for the square root of monthly average precipitation, define

M0 : Value_sqrt_avg ~ Intercept + Longitude + Latitude + Elevation + DecYear

Covariates such as the spatial coordinates, elevation, and suitable cos and sin functions are used to capture seasonal variability. By adding covariates cos(2πkt) and sin(2πkt) of frequency k = 1, 2, . . . we can also model the seasonal variability, defining models M1 , M2 , M3 , and M4 , where the predictor expression for model MK adds

K

[γc,k cos(2πkt) + γs,k sin(2πkt)]

k=1

to the M0  predictor expression, for model coefficients γc,k  and γs,k .  The time variable t is defined to be DecYear so that the lowest frequency k = 1 corresponds to a cosine function with a period of one full year.

Organise your code so that you can easily change the input data without having to change the estimation code and predict for new locations and times.  You’ll need to be able to run the estimation for different subsets of the data, as well as for manually constructed covariate values, e.g., to illustrate prediction at a new location not present in the available weather data.

• Estimate the model parameters for M0 , M1 , M2 , M3  and M4 .

Present and discuss the different models and the results of the model estimation.

Assessment:  Station and season differences

We are interested in how well the model can predict precipitation at new locations. Therefore, construct a

stratified cross-validation that groups the data by weather station and computes the prediction scores for each station, as well as the overall cross-validated average scores, aggregated to the 12 months of the year.

Present and discuss the results of the score assessments for both Squared Error and Dawid-Sebastiani scores.

• Is the model equally good at predicting the different stations?

• Is the prediction accuracy the same across the whole year?