闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

INFR11140 IMAGE AND VISION COMPUTING

1. THIS QUESTION IS COMPULSORY

Short answer questions. Each question is worth 2.5 marks.

(a) Describe brieﬂy what are vanishing points and vanishing lines in an image. Use a diagram to illustrate your answer.

(b) What linear transformation matrix will rotate an image by 45 degrees?

(c) Consider the binary image A which is convolved with unknown ﬁlter F to produce image B = F * A. What ﬁlter F was used?

(d) The mean-shift clustering algorithm appears to free the user from the re- sponsibility of choosing the number of clusters. However, this is misleading. Why?

(e) You are designing a vision system for automated maintenance inspection of jet engines for faults that could lead to engine failure. It’s crucial that the system warns maintenance engineers of potential faults to avoid plane crashes. Which of the following is the most appropriate evaluation metric for this system: Accuracy, Precision, Recall, Loss value? Explain your choice.

(f) Explain the process of early stopping in neural network learning, and what it is designed to achieve.

(g) Outline the notion of Bayes error vs Human error for an image recognition problem. How do they relate and how could you estimate each?

(h) Explain the role of interest point detectors vs descriptors in computer vision pipelines.

(i) Why do we need non-max suppression in object detection systems? Illus- trate with a diagram.

(j) Outline the role of transfer learning in CNN-based computer vision, and some hyper-parameters that may need to be tuned to use it eﬀectively.

2. ANSWER EITHER THIS QUESTION OR QUESTION 3

This question is about image formation and low-level vision.

(a) With reference to this schematic of image formation:

i. Explain the concepts of (1) surface albedo, (2) diﬀuse (aka Lambertian) reﬂection, (3) specular reﬂection.

ii. For diﬀuse reﬂection, which viewing and incidence angles leads to max- imum perceived brightness?

iii. For specular reﬂection, which viewing and incidence angles leads to

maximum perceived brightness?

(b) Consider the 1D image A:

i. Consider applying a width-3 mean ﬁlter F twice to image A, where a, . . . , g are variables representing pixel values. Speciﬁcally, apply the ﬁlter once to produce a smoothed image B = F * A, and then re-apply

the same mean ﬁlter again C = F * B. Give an expression for the value of the center pixel of the output image C where mean ﬁlter has been applied twice.

ii. Derive a single 1D ﬁlter that, when applied once to the image, will produce the same result as applying the above mean ﬁlter twice.

to detect vertical and horizontal edges respectively by approximating the gradients respectively of image I with respect to x and y position. i. Such ﬁnite diﬀerent estimates of image gradient are susceptible to noise in the image. How can we increase robustness to noise in edge detection?

ii. Suppose we want to detect 45 degree diagonally oriented edges. Outline one approach to this that make use of the above operators, and one that does not.

iii. Show that edge detection (and hence operations that depend on edge de- tectors such as Harris) is invariant to additive global brightness changes.

(d) You are using the hough transform algorithm to detect and ﬁt multiple lines in a single image given its edge map as an input. You observe several chal- lenges including inaccurate line ﬁtting (position and orientation estimation), false positive lines returned due to noise in the image, and slow runtime.

i. What parameters of the hough transform can aﬀect these properties of the algorithm?

ii. How might you tune these parameters to solve these problems?

iii. Which tradeoﬀs may arise in simultaneously solving these problems?

3. ANSWER EITHER THIS QUESTION OR QUESTION 2

This question is about high-level vision and convolutional neural networks.

(a) Consider a convolutional neural network (CNN) with 2 consecutive 5 × 5 convolutional layers, each with stride 1. Answer the following questions numerically as well as with a sketch illustrating your reasoning.

i. How large is the receptive ﬁeld of a neuron in the 2nd non-image layer of this network?

ii. How large is the receptive ﬁeld of a neuron in the 2nd non-image con- volutional layer if we now insert a 2 × 2 max pooling between the two convolutional layers?

iii. How large is the receptive ﬁeld of a neuron in the 2nd non-image con- volutional layer if we now modify the second convolutional layer to use stride 2?

(b) Consider a neural network for recognizing digits [0, ..., 9] that processes 100 × 100 RGB input images, with a 5 × 5 convolutional layers of stride 2 with 16 ﬁlters, followed gby lobal max pooling and a fully connected layer. Ignoring, or assuming ‘same’ padding:

i. approximately how many operations (multiplications+additions) are used by the convolutional layer?

ii. approximately how many operations (multiplications+additions) are used by the FC layer?

(c) Suppose you are training a neural network to recognize letters [a, ..., z]. Your neural network uses a softmax output, cross-entropy loss, and 1-hot encoded labels y. Recall that cross entropy loss is deﬁned as L(y, ) = _y . log().

i. You initialize your weights to mean 0, standard deviation 1e −6 , but accidentally set your learning rate to zero. What predictions do you expect your network to make after the ﬁrst epoch? What do you expect your average loss to be after the ﬁrst epoch?

ii. After correcting your learning rate bug and training the model, it now converges and predicts = [0.1, 0.5, 0.4, 0, . . .] on a validation image, whose true label is y = [0, 0, 1, 0, 0, . . .]. What is the loss value for this image?

iii. Later on, you want to upgrade your system to allow multiple letters to be detected in a single input image. You switch to using multi-hot labels so that, for example y = [1, 0, 1, 0, 0, . . .] would mean that both a and c are present. To save time you reuse the previous architecture and loss and retrain. Why is this problematic? What could you change to solve this problem?

(d) Suppose you are training a neural network to recognize diﬀerent vegetables for a grocery sorting system

i. Before starting, you have to decide the image resolution to use for data collection. What does this impact? How might you decide the minimum acceptable resolution to use?

ii. You request a dataset of 10,000 carrot and tomato images from the sorting factory. You start prototyping by training a CNN on a small set of 100 images. Training converges, but the training loss is high. A colleagues suggests to solve this by using the full training set size of 10,000 images. Is this approach likely to help? Describe the likely outcome of training with 10,000 images.

iii. Later you manually inspect the data and ﬁnd that you were given 9990 carrot images and only 10 tomato images. What is likely to happen when training on this data? How can you solve this?

iv. You also discover that your initial train and validation splits were made by splitting the data in its original order, which followed time of collec- tion. The train data is well lit daytime data, and your validation data is poorly lit evening data. What will this do? How can you solve it?