Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 4051: Homework 5

Due: Friday, Dec 1 at 5:00 pm on Canvas.

Please submit your solutions as one single PDF file. You do not need to submit your code. Do not submit an Rmarkdown, or Jupyter file. You can use these tools to generate your results, but you are required to submit your solutions as a PDF file. You can include your hand-written part (if any) as a figure. Make sure they are directly readable.

Honor Policy: You may discuss homework problems with others, but you must finish your assignment independently based on your own understanding. Copying others’ works is not allowed. Please indicator your collaborators.

(10 pts)   Problem 1   The file ”IsingHW6.Rda” has two objects there:

• X: 1004×20 binary data matrix with Xij ∈ {0, 1}. Each column in the data corresponds to a neuron in the human body, and each row corresponds to a stimulus in experiments about neuron activation. In particular, if Xij = 1, we know that in the ith experiment, the jth neuron is activated.

• A: 20 × 20 binary matrix such that Aij = 1 means neuron i and neuron j is spatially adjunct to each other. So A also gives a graph relation between the neurons.

Fit the Ising model on the data for a sequence of λ values to produce a sequence of estimated graphs. Compare your estimated graphs with the spatial graph from A. What are the similarities or dissimilarities between them? Pick one from your estimated graphs that looks the most similar to the A graph. Comment on the comparison.

(5 pts)   Problem 2   Recall the problem in HW 5 about the stock data. Start from your matrix X, which is the matrix of log returns. First, standardize your X matrix such that the column means are zero and the column variance is 1. Denote this one by Y .

Previously, in HW5, we fitted a Gaussian graphical model in X’s correlation matrix, which corresponds to Y covariance matrix. Note that we assume each row follows a 10-dim normal distribution N(0, Σ). Last time, you picked the tuning parameter λ by looking at the graph. This time, we find the best parameter by model validation as follows:

• Split your Y by the first 60 rows and last 19 rows and denote them by Y(1) (training) and Y(2) (validation).

• Then use Y(1) as your data to fit a Gaussian graphical model for a sequence of λ values.

• For each λ, you will obtained an estimated covariance matrix ˆΣ. Calculate the corre-sponding log-likelihood of Y (2) based on the currently estimated model N(0, ˆΣ). (Hint: in R, you can use dmvnorm in the R package emdbook for it. Use ?dmvnorm to see the instructions.)

• Select the λ giving the highest log-likelihood of the data as the selected λ

• Use that selected λ to fit the Gaussian graphical model again, but based on the full data set Y. Return it as your final model.

Questions:

1. Plot the graph structure of your final model. Compare it with the one you used in HW5. Comment.

2. The above procedure can be seen as a validation set approach for model selection. Do you think it would make sense to use a K-fold cross-validation or leave-one-out cross-validation approach in this setting? Why?