ECS797P/U Machine Learning for Visual Data Analysis 2022

发布时间：2022-08-05

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Main Examination Period 2022 Semester B

ECS797P/U Machine Learning for Visual Data Analysis

Question 1

a) In Figure 1, we depict an image and the plot of the entropy of the intensity in a disk around a pixel A, as a function of the radius of the disk.

i) Make similar plots of the entropy around the pixels B and C with a brief explanation about the shape of the plot.

ii) Explain which of the points A, B, and C, will be selected as salient points, and at which scale.

Figure 1

[9 marks]

b) Explain briefly the differences between Action Recognition, Temporal Action Localisation and Spatiotemporal Action Localisation. What is the input and what is the output of an algorithm in each case?

[6 marks]

c) A computer vision company designs an action recognition system that aims at counting how many times customers perform the action “drinking” in a pub setting, such as in Figure 2.

i) Explain, why a part-based localisation method is a good choice in this setting.

ii) What kind of descriptors would you recommend to be extracted in the spatiotemporal interest point regions and why?

iii) Suggest how the developer of the system should choose the number of clusters in their visual codebook.

Figure 2

[8 marks]

Question 2

a) Describe what do we mean by the term optical flow field and whether it is the same as the true motion field. What is the input and what is the output of an optical flow algorithm?

[6 marks]

b) Explain why the optical flow equation is not sufficient by itself to estimate an optical flow field. How can one overcome this problem?

[5 marks]

c) Explain how the optical flow equation is used in the Lucas-Kanade method for optical flow estimation.

[4 marks]

d) Some Neural Network (NN) based methods for Optical Flow estimation, such as FlowNet, use a network to predict the optical flow field. Describe the first and the last layer of such networks. Describe what is the supervisory signal (or optimisation criterion), how it is obtained and state whether it is an instance of supervised or unsupervised Machine Learning.

[10 marks]

Question 3

a) Consider a database of 2000 face images of size 30 by 60. This database contains images of 40 people each having 50 images. Now consider applying Principal Component Analysis (PCA) to the database to construct 10 eigenfaces for face recognition.

i) What is the dimensionality of the covariance matrix of the dataset?

ii) What is the dimensionality of the mean face?

iii) What is the dimensionality of the eigenface?

iv) What is the dimensionality of the pattern vector?

[8 marks]

b) We have 12 images belonging to 4 people. We apply face recognition and get the results shown in the table below. Compute the recognition rate, and the confusion

matrix.

Ground truth label	1	1	1	2	2	2	3	3	3	4	4	4
Predicted label	2	1	1	2	2	2	3	2	3	3	4	1

[5 marks]

c) Suppose you are designing an eigenface based face recognition system for a company. The system is trained using the face images of all the employees of the company. When a human passes the entrance of the company building, a camera will capture the face of the human and perform face recognition.

i) Which pre-processing technique is required before face recognition and why?

ii) How will you design the system to make it able to determine whether: the human is an employee of the company; the human is not an employee of the company; a non- face object is captured by the camera.

[12 marks]

Question 4

a) Explain why the Viola-Jones algorithm is slow in training but very fast in detection.

[6 marks]

b) Fig. 3(a) depicts a Haar feature pattern (white colour denotes +1 and black denotes -1) and Fig. 3(b) depicts the pixel values of a 4x4 image. Fig. 3(a) and Fig. 3(b) are of the same size. Compute the Haar feature with the following steps.

i) Compute the integral image of Fig. 3(b)

ii) Show the procedure to compute the Haar feature value from the integral image.

Fig. 3(a) Harr feature

Fig. 3(b) Image

[10 marks]

c) Compare and contrast the information modelled in Active Shape Model (ASM) vs Active Appearance Model (AAM). Is the ASM suitable for modelling any deformable object or only faces? Explain you answers.

[9 marks]