闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH 475 - Spring 2022 – Homework 7

2022

Questions

1. (a) Let f : R → R be a continuous, piecewise linear function with n breakpoints z1 < . . . < zn, i.e.

f (x) = aix + bi, zi < x < zi+1 ,

(here z0 = _o and z& = +o). Show that f (x) can be written as

f (x) = a0x + b0 + (ai _ ai一1 )σ(x _ zi),

i=1

where σ is the ReLU activation function. Use this to deduce that f can be expressed as a ReLU network with one hidden layer of width n + 2.

(b) Show that the function f : R2 → R given by f (x, y) = mine0, maxex, y}} can be written as a ReLU network with two hidden layers. Describe the aﬃne maps and the widths of the layers.

Hint: ﬁrst show that maxex, y} = σ(x) _ σ(_x) + σ(y _ x).

(c) Show that the function f : R3 → R given by f (x1 , x2 , x3 ) = maxex1 , x2 , x3 } can be written as a ReLU network with two hidden layers. Describe the aﬃne maps and the widths of the layers.

Hint: observe that f (x1 , x2 , x3 ) = maxex1 , maxex2 , x3 }}, and use part (b).

2. For this exercise, we will be using Google Colab to train deep neural networks on a function regression task using the “≥』礻 j斗斗〉ipyn』” Python notebook. Please read the descriptions above each block of code in the notebook, and modify the code to answer the questions below. For parts (a)-(c), use “example = 1” so that the function is

f1 (x) = log(sin(10x) + 2) + sin(x).

(a) For each mtrain ∈ e100, 200, 300, 400, 500, 1000, 2000}, generate data using the code in

the ﬁrst block, and train a ReLU deep neural network with 5 hidden layers and 50 nodes per hidden layer, recording the testing error deﬁned as

┌│ 1 mtest

εtest = ←│ mtest i=1 |f (xi) _ Φ(xi)|2 ,

which is output on line 147 of the second code block at the end of execution from the variable “test err”. What do you observe?

(b) Now with mtrain = 500 and the number of hidden layers l = 20, for each n ∈

e200, 400, 800} train a ReLU DNN with n nodes on the hidden layers. What do you observe? How do the values of εtest you observe here compare to those from part (a)? Using the third code block, generate animations of the training process and describe what you see. Note: since training these larger networks takes much longer, it is recommended to set the ‘test interval’ parameter large, e.g. 100 or 1000.

(c) Repeat the experiment in part (b), but now use the He normal initializer by uncom- menting lines 68, 72 of block 2 of the code to replace the weight and bias initializers given by “initializers.RandomNormal(stddev = 0.1)” with the “initializers.HeNormal()” initializer. What do you observe? How do the values of εtest compare in this case?

(d) Now try modifying the architecture, e.g., the width, depth, and activation function, to obtain a better approximation of the discontinuous function

．．．．

f2 (x) = ．

．．．．

x + 5 _x log(x) + 2

x < _

_ 5 x < 0

0 5 x <

5 x.

This function is plotted in Fig. 1 and can be selected by setting example = 2 in the ﬁrst block of code. The baseline architecture we will compare against is a ReLU deep neural network with 5 hidden layers and 50 nodes per hidden layer.

Use this deﬁnition of f2 for the following: for each

mtrain ∈ e200, 400, 600, 800, 1000},

train your selected DNN architecture on function data from f2 , comparing εtest to the test error computed with the ReLU DNN with 5 hidden layers and 50 nodes per

Figure 1: Example of training a 20 × 800 ReLU DNN.

hidden layer, i.e., plotting the errors for both at each value of mtest above. Record the architectures tested, what you observed for each, and the best performing architecture overall. Use the third block of code to produce animations of the training process. What do you observe during training, in particular near the points of discontinuity?