Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP5328 Sample Final Exam

This is a sample fnal exam of COMP5328.

The information about the fnal exam of COMP5328 is provided below.

Duration: 3 hours and 10 minutes (190 minutes). mis includes 10 minutes of reading time, but you can start writing whenever you are ready.

Bufer time (Assignment): You will be allowed a bufer time of 15 minutes. mis means that you have 15 minutes afer the ’Due’date and time to still be able to submit your exam before the assignment closes. Please note that your assignment will be marked as a late submission if you submit afer the ’Due’date and time. If you are unable to submit your exam within 15 minutes of bufer time, you should apply for special consideration. Bufer time does NOT mean you have extra time to complete your exam.

Upload time: me fnal exam has 15 minutes of upload time added to the duration to allow you to upload images/workings etc. as per your exam instructions. Do NOT treat this as extra time. me upload time must be used solely to save and upload your documents correctly as per the exam instructions.

Format: me fnal exam is a take-home exam with 10 questions. You should read the questions, and FILL WORDS /INSERT IMAGES OF YOUR HAND-WRITTEN ANSWERS IN THE TEX TEMPLATE. For hand-writen answers, write them down on A4 paper clearly. Take images just covering the entire A4 paper. Only the image formats of“.png”, and “.jpeg”are accepted. You should atempt all questions and follow the instructions for each question carefully.

≈estion type

Points

Recommended

time spent

Part A

You are allowed to answer the questions in text only

40

60

Part B

You are allowed to answer the questions in text and/or image. Words on the images are only to explain the formula, the derivation, or the logic chain. If you want to explain concepts/statements by words, please type the words in the latex answer template.

60

120

Part A: You are allowed to answer the questions in text only.

Question 1 (Approximation error, estimation error) [10pts]

Suppose we have a task which is to identify diferent persons and a training dataset which contains pairs of human face image and identity. Two predefned hypothesis classes are given, which are linear classifers and deep neural networks, respectively.

1). Which hypothesis class has a larger Vapnik–Chervonenkis (VC) dimension? Explain why in details.

2). Which hypothesis class is more possible to produce a smaller approximation error? Explain why in details.

3). Which hypothesis class is more possible to produce a smaller estimation error when trained with this dataset? Explain why in details.

Question 2 (Causal Inference) [10pts]

We have a directed graphic causal model as follows.

[Note that you can use X-1, X-2, X-3,. . . ,X-4, X-5 and X-6 to refer to x1 , x2 x3 , x4 , x5 , and x6 , respectively. You can use NULL to denote an empty set }0|.]

1). Let x = }x2 | and Y = }x6 |, Provide a set which d-separates x and Y . Explain why in details.

2). Let x = }x1 , x2 | and Y = }x3 , x4 |, Provide a set d-separates x and Y . Explain why in details.

Part B: You are allowed to answer the questions in text and/or image.

Question 5 (Sparse coding) [10pts]

In sparse coding, we prefer sparse matrices.

1). Please briely explain the concept of sparsity.

2). me defnition of the lp norm is as follows:

k

lp  = lalp  = (      |aj |p )1/p ,

j=1

where a e Rk . Among l0 , l1 , l2 norms, which one is the best measure of sparsity? Please give a brief explanation.

3). Explain why the l0 norm is hard to be optimized. How to handle the problem?

Question 6 (Expected Risk and Empirical Risk) [10pts]

Let l be a loss function. We can defne the expected risk of a hypothesis h as

R(h) = _[l(x, y, h)].

And the corresponding empirical risk can be defned as

n

i=1

where s = }(x1 , y1 ), . . . , (xn , yn )| is a training sample consists of n pairs of i.i.d. random variables. Let H be the predefned hypothesis class. We further defne

f*  = argmin R(h),

h*H

fS  = argmin RS (h).

h*H

Prove that R(hS ) _ R(h* ) s 2 suph*H[RS (h) _ R(h)]. Show the detailed proof steps.

Question 7 (Optimization) [10pts]

Suppose we have a function f (h) where h is the hypothesis. By Taylor’s theorem, we have

f (hk+1) = f (hk ) + 75f (hk )T dk + o(7).

1). If we want to minimize the function value, the steepest gradient descent method tells us that we need to update our hypothesis in the direction of the negative gradient _5f (hk ). Please explain why.

2). Please explain how Armijo Backtracking line-search works. (Include some mathematical notation and formulas if necessary. You have to carefully explain the meaning of each notation used.)