关键词 > CS755/85

CS 755/85 Computer Vision Spring 2022

发布时间:2022-05-03

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS 755/85 Computer Vision

Spring 2022

Final Exam: Bag of Words for Image Recognition

Introduction:

You will perform the image recognition pipeline using bag of visual words (BoVW) approach. Use the starter code.

Forbidden functions: bagOfFeatures(), evaluateImageRetrieval()   [You will lose all points if any of these two functions appear in your code]

Database:

You will use the 15-scene database introduced in a CVPR2006 paper by Lazebnik et al. The paper is available along with the dataset in the course website. The dataset has natural scenes of 15       classes namely, Office, Kitchen, Living room, Bedroom, Store, Industrial, Tall building, Inside city, Street, Highway, Coast, Open country, Mountain, Forest, and Suburb. Each class has 100 training examples and 100 test examples. For this assignment you will use this fixed split for training and testing. The screenshot below shows some of the example images from different classes. The       starred categories are from another database known as 8-scene database.

 

You will use Bag of Visual Words approach to classify the test images into one of the 15 categories.

Task   1   [6   points].   Build   a   codebook   using   the   training   images:   Write   a   function, build_codebook (), which will extract features from training images and cluster them with  K-means  clustering  algorithm.  The  cluster  centers  identified  through  K-means algorithm will form your codebook. A few things to consider as you write this function:

1.    You  can use any feature  descriptor  functions  available  in Matlab.  For  the sake of computation, the number of descriptors per image should not be too high (the exact number however is a design choice).

2.   Use all 100 training images per category.

3.   For K-means clustering, you can use Matlab function kmeans (). Be familiar with the parameters of this function to obtain the best clustering performance. For the value of K, start with a value between 150 to  200 and increase/decrease if necessary (based on the overall performance). In general, higher values of Kwill make a better codebook but will make the computation very slow. Therefore, exercise caution in increasing the value of K.

Task 2  [6 points]. Building BoVWs: Write a function,  create_bovw(), that will  generate BoVWs for each training and test image. For this you will need to extract feature descriptors for test images as well. Follow the strategy of Task 1 for this. For any image (train or test), create_bovw() function will create a histogram that indicates how many times each codeword (i.e. a K-means cluster center) was used by the feature descriptors of that image. Don't forget to normalize the histogram, or else a larger image with more feature descriptors will look very different from a smaller version of the same image. You can use Matlab’s histogram () function for that.

Task   3    [6   points].    Recognition   and   performance    reporting:   Write    a    function,

my_nn_classifier (), which will predict the class label for every test image by finding the training image with most similar features. Therefore, input parameters to this function can be the BoVW representations of all training and test images and the labels of the training images. The function will output the class labels of the test images. You can use Matab’s  KNN  classifier  (fitcknn()).  Start  with  K=3  and  go  up  to  K=7  for  the  best performance  (unless  the  computation  becomes very  heavy). You  can  use  any  Matlab

function for distance measurement, comparison, and sorting.

The performance of your classifier is calculated as:

Score = (# of correct classification)/(Total # of samples)

Write a function my_score() which will report the performance score for each of the 15 classes and generate a confusion matrix. A discussion on confusion matrix can be found here:   https://en.wikipedia.org/wiki/Confusion_matrix.    Expect   an    overall   score   of

approximately 40%. But you can achieve up to 60 % with a good descriptor, a well-designed K-NN classifier, and a suitable distance metric.

Task 4 [12 points]. Report and code: The purpose of the report is to describe the work that you did in your code. The report should discuss in detail how you accomplished the three tasks, with graphics, equations, and code snippet, as necessary. Discuss all parameters choices, what worked and what did not. Your code should be well-commented and run out-of-the- box. You will lose points if your code generates results but your report does not discuss that.