闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MTHM017 Advanced Topics in Statistics Assignment

The assignment has three main parts. Part A involves ﬁtting an auto-regressive process to time series data model using the BUGS language and assessing the eﬀect of using diﬀerent model structures on the estimation of missing data. Part B involves using diﬀerent methods for classiﬁcation of data into two groups. Part C involves producing a narrated power point presentation based on question 3 of part B.

Part A gives 50% of your ﬁnal marks, Part B gives 30% of your ﬁnal marks and Part C gives 20% of your ﬁnal marks. [Assignment: 160 marks in total]

A. Bayesian Inference [80 marks]

The dataset contains measurements of particulate matter (PM10) air pollution in London (measured at the Bexley and Hounslow sites) for 2000 to 2004. The data can be found in London_Pollution.csv.

1. [4 marks] Summarise the two sets of data and calculate the number of missing data points for each monitoring location, by year. Comment on whether the patterns of missingness have changed over time.

2. [3 marks] Plot the PM10 measurements against time for the two sites, highlighting (showing clearly) the periods of missing data.

3. [5 marks] The locations in Eastings and Northings of the two locations are Bexley: (551862, 176380); and Hounslow: (521070, 178480). Plot these two monitor locations on a map of London and comment on any diﬀerence you found in the summaries of the data in the context of the geographical location of the monitoring sites. The necessary shapeﬁles are on the ELE page of the course.

Considering the Bexley data, there is missing data. We are going to ﬁt a model that allows us to estimate these missing data by treating them as model parameters that will be estimated (and we ﬁnd posterior distributions for them). As we have time series data, we are going to use the fact that day-to-day measurements will be correlated, i.e. today’s measurement will correlate with yesterday’s.

A random walk process of order 1, RW(1), is deﬁned at time t as

Yt - Yt − 0 = wt

Yt = Yt − 0 + wt

Where wt are a set of realisations of random (or white) noise, e.g. wt ~ N(0, σw(1)). Note the ﬁrst line refers to the diﬀerences in the values at consecutive time points being white noise.

We are interested in ﬁtting a random walk model to the Bexley data. The model will be of the following form:

Bexleyt ~ N (Yt , σv(1))

Yt ~ N (Yt − 0 , σw(1))

Where σw(1) is the variance of the white noise process associated to the random walk. We then make noisy

measurements of this random walk process, thus Bexleyt , the measurement we have at time t, equals to

the true value of the underlying process Yt plus some measurement error. In the formula above, σv(1) is the

variance of this measurement error.

4. [16 marks] Code this model using the model deﬁnition below in JAGS to analyse the Bexley data from 1st January 2000 to 31st December 2003 (NOTE the end year). Due to the nature of the model you

will have to explicitly specify a value for Y0 in the model (i.e. for the ﬁrst time point as Yo doesn’t exist). One suggestion might be Y0 ~ dnorm(0, 0.001). The model deﬁnition can be found below.

Run the model for 8,000 iterations, with 2 chains, discarding the ﬁrst 4,000 as ‘burn-in’. Produce trace plots for the chains and summaries for the ﬁtted parameters (including the missing data). Hint: You should initialise both chains. One suggestion might be using the mean and median to initialise the missing values of Bexley, and using random uniforms (with a narrow interval centred around say 20) to

initialise Y.

jags.mod <- function(){

Y[1] ~ dnorm(0 , 1.0E-3)

for (i in 2 : N) {

Bexley[i] ~ dnorm (Y[i],tau.v)

Y[i] ~ dnorm (Y[i-1], tau.w)

}

# priors

tau.w ~ dgamma(1 ,0.01)

sigma.w2 <- 1/tau.w

tau.v ~ dgamma(1 ,0.01)

sigma.v2 <- 1/tau.v

}

5. [3 marks] Comment on whether the chains for all the parameters have converged. You should include evidence that supports your claim.

6. [5 marks] Extract the posterior means and 95% credible intervals for t , and plot them against time, together with the original data (the measurements). Comment on the width of the credible interval during the periods of missing data.

An alternative model is a random walk process of order 2, RW(2). This assumes that the ‘diﬀerences between diﬀerences’ is white noise and is deﬁned at time t as

(Yt - Yt − 0 ) - (Yt − 0 - Yt −1 ) = wt

Yt = 2Yt − 0 - Yt −1 + wt

Where again wt are a set of realisations of random (or white) noise, e.g. wt ~ N (0, σw(1)).

That is now we are interested in ﬁtting a random walk model of order 2 to the Bexley data. The model will be of the following form:

Bexleyt ~ N (Yt , σv(1))

Yt ~ N (2Yt − 0 - Yt −1 , σw(1))

Again, σw(1) is the variance of the white noise process, and σv(1) is the variance of the measurement error.

7. [12 marks] Code this RW(2) model in JAGS to analyse the Bexley data from 1st January 2000 to 31st December 2003 (NOTE the end year). Run the model for 8,000 iterations, discarding the ﬁrst 4,000 as ‘burn-in’. Produce trace plots for the chains and summaries for the ﬁtted parameters (including the missing data). Comment on the diﬀerences between the smoothing eﬀects of the two models. For this you might ﬁnd it useful to plot the outcome for the ﬁrst quarter of 2000 separately (for both models). Note that getting this model to converge might be quite tricky. Instead of spending much time trying to get it to converge, you should instead try to explain why we might see lack of convergence here.

8. [8 marks] Use both of your models to predict the measurements of PM10 at Bexley for the ﬁrst week of 2004.

9. [6 marks] For both models, plot the predicted values of PM10 for the ﬁrst week of 2004, along with the actual measurements, against time. By calculating appropriate measures of comparison, comment on how good you think the models are at forecasting. Hint: you may want to re-run the model with an extra line to calculate the root mean squared prediction error 入 t20(n) , noting that this value will also have a posterior distribution as it is a function of the predicted values (that are treated as

unknown paramaters that need to be estimated).

We are now going to repeat this analysis for the Hounslow site.

10. [8 marks] Fit the RW(1) and RW(2) models in JAGS to the Hounslow data for 2000 to 2003. Use non-informative priors. Comment on how well the chains have converged and how well both models ﬁt the data.

11. [10 marks] Now re-run these analyses using informative priors, using what you have learnt from ﬁtting the Bexley model. By comparing the results (e.g. summmaries of the posterior distributions, convergence etc), comment on any eﬀect that using diﬀerent priors has (or has not had) on the results.

B. Classiﬁcation [48 marks]

The following ﬁgure shows the information in the dataset Classification.csv - it shows two diﬀerent

groups, plotted against two explanatory variables. This is simulated data - the groupings are determined

by a (known, but not to you!) function of X1 and X2 with added noise/random error. The aim is to ﬁnd a suitable method for classifying the 200 datapoints into two groups from a selection of possible approaches.

−1

−2

−2.5 0.0 2.5 5.0

Group

1. [5 marks] Summarise the two groups in terms of the variables X0 and X1 . Describe your ﬁndings. Considering the plot showing the observations and the numerical summaries, which classiﬁcation methods do you think are suitable for classifying this data?

2. [1 marks] Select 75% of the data to act as a training set, with the remaining 25% for testing/evaluation.

3. Perform classiﬁcation using the following methods. In each case, brieﬂy describe how the classiﬁcation method works, present the results of an evaluation of the method (highlighting diﬀerent aspects of the

model performance) and describe your ﬁndings. Where appropriate optimise the parameters of the method (e.g. by using ROC curve, cross validation).

(a) [5 marks] Linear discriminant analysis.

(b) [4 marks] Quadratic discriminant analysis.

(d) [10 marks] Support vector machines. (e) [8 marks] K-nearest neighbour regression.

4. [3 marks] Compare the results from these ﬁve approaches and select what you think is the best method for classiﬁcation in this case, explaining your reasoning.

5. [4 marks] The ﬁle ‘ClassiﬁcationTrue.csv’ contains the true classiﬁcations, based on the function of X1 and X2 without the noise. Evaluate how the 5 diﬀerent methods from Questions 3 (in each case using the previously selected optimal value of the parameters) compare to the truth. Does your choice from Question 4 still perform best in this case?

C. Presentation [32 marks]

The presentation is based on PartB/Q3 only. You should submit a narrated power-point presentation that should be 5 minutes long, and you should aim for 5 slides in total (this could mean 1 slide on each method).

In this you should explain what the problem is, how you approached it, and what your ﬁndings are.

You should pay attention to the clarity/pace/coherency of the delivery, the style/information-balance on the slides, clear description of methodology and time management.