Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ST420 Assignment 1: Exploring neural networks

25/01/2023

Aims

This assignment, worth 10% of the final module mark aims to give you hands-on experience of some of the ideas discussed in the early part of the module. Your task is to investigate the use of neural networks for univariate regression.

Submission details

This assignment must be submitted, via the submission portal on moodle, by 12 noon on 10th February 2023. You must submit two files:

1. A two page .pdf file, containing your work, the requirements of which are set out in the tasks below. This pdf may be generated by any means you wish. I suggest the use of R Markdown, but submissions compiled using LaTeX or any other means are also acceptable. The page limit is strict - nothing will be marked after the first two pages. Zero marks will be awarded for files that are not in pdf format. (Note that it is easy to create a pdf from any other format using the print to pdf option.)

2. A single file containing the all of the code you used to generate the results.

The usual rules about plagiarism and collusion apply. In particular, your code should be your own; it will be checked for signs of collusion.

Background

The goal of this assignment is to investigate the use of neural networks for regression. You will not be required to write code for training of neural networks from scratch, but can make use of existing packages. You are free to use any programming language you wish, but I recommend R, because of the convenient R package

neuralnet , which I will be using as an illustration throughout. For example, if you are familiar with Python you are welcome to use PyTorch instead.

In RStudio, the  neuralnet package can be installed and loaded as follows:

install.packages("neuralnet")

l ibrary (neuralnet)

Generating training data

We will be generating data from a sinusoidal function, plus noise:

Y = f (X) + ϵ ,    ϵ ∼ N (0, 0.12 ),

with

f (x) = sin(15x).

We will take the xi-values to be the sequence 0, 0.05, 0.10, … , 1.

x = seq(from=0,1,0.05)

y = sin(15*x) + rnorm(length(x), sd=0.1)

xy_data = data.frame(x,y)

1. (1 mark) Plot your generated training data and the true function f on a single figure.

2. (1 mark) Throughout this question we will use squared loss. Using a regular grid of x values, and by generating new y-values, estimate the risk of the true function f . (This is an estimate of the optimal risk.)

Fitting neural networks

You can easily fit a neural network in R using the function  neuralnet ; see  ?neuralnet for the details:

xy_nn = neuralnet(y~x, xy_data, hidden=c(2), act.fct = "logistic",

lifesign = "full", lifesign.step = 5000, stepmax=10^6, rep=1)

You can use the default choices of optimization algorithm, learning parameters and initialisation. Note that since both the initialisation and training procedure is stochastic, it may be the case that for certain runs the training algorithm does not converge within the maximum number of steps. Simply re-run the code in this case (it should converge most of the time).

To visualise the network and see the values of the weights and biases, you can use  plot :

plot(xy_nn)

You can see the output, i.e. the predictions, of the neural network using the  predict function:

x_pred = seq(0,1,0.01)

y_pred = predict(xy_nn, newdata = as.matrix(x_pred))

3. (3 marks) Adapt the code to train a neural network with 1 hidden layer containing 3 neurons, using the   tanh activation function. (I.e. there are a total of three layers: the input layer, 1 hidden layer, and then    the output layer.) Plot the predicted values of this neural network on the set of  x_pred points, along with the true function fon the same plot. (Do not be alarmed if the fit is poor.)

4. (1 mark) Generate an independent test set of 10 points from the model (you can draw the x values uniformly in [0, 1]), and estimate the risk of the neural net estimator on this test set.

5. (1 mark) Repeat questions 3-4 for a neural network with 1 hidden layer and 4 neurons.

6. (3 marks) Plot now estimates of the risk for networks trained with 1 hidden layer, with widths ranging from 5 to 30. (As before, due to the stochastic nature of neural network training, you may need to re-run the code for certain widths if it does not converge.) When the number of parameters exceeds the number of training points, this is known as the overparameterized regime. Comment on the behaviour of the neural network in the overparameterized regime; is there evidence of overfitting?