CS 755/85 Computer Vision Spring 2022

发布时间：2022-05-03

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS 755/85 Computer Vision

Spring 2022

Final Exam: Bag of Words for Image Recognition

Introduction:

You will perform the image recognition pipeline using bag of visual words (BoVW) approach. Use the starter code.

Forbidden functions: bagOfFeatures(), evaluateImageRetrieval() [You will lose all points if any of these two functions appear in your code]

Database:

You will use the 15-scene database introduced in a CVPR2006 paper by Lazebnik et al. The paper is available along with the dataset in the course website. The dataset has natural scenes of 15 classes namely, Office, Kitchen, Living room, Bedroom, Store, Industrial, Tall building, Inside city, Street, Highway, Coast, Open country, Mountain, Forest, and Suburb. Each class has 100 training examples and 100 test examples. For this assignment you will use this fixed split for training and testing. The screenshot below shows some of the example images from different classes. The starred categories are from another database known as 8-scene database.

You will use Bag of Visual Words approach to classify the test images into one of the 15 categories.

Task 1 [6 points]. Build a codebook using the training images: Write a function, build_codebook (), which will extract features from training images and cluster them with K-means clustering algorithm. The cluster centers identified through K-means algorithm will form your codebook. A few things to consider as you write this function:

1. You can use any feature descriptor functions available in Matlab. For the sake of computation, the number of descriptors per image should not be too high (the exact number however is a design choice).

2. Use all 100 training images per category.

3. For K-means clustering, you can use Matlab function kmeans (). Be familiar with the parameters of this function to obtain the best clustering performance. For the value of K, start with a value between 150 to 200 and increase/decrease if necessary (based on the overall performance). In general, higher values of Kwill make a better codebook but will make the computation very slow. Therefore, exercise caution in increasing the value of K.

Task 2 [6 points]. Building BoVWs: Write a function, create_bovw(), that will generate BoVWs for each training and test image. For this you will need to extract feature descriptors for test images as well. Follow the strategy of Task 1 for this. For any image (train or test), create_bovw() function will create a histogram that indicates how many times each codeword (i.e. a K-means cluster center) was used by the feature descriptors of that image. Don't forget to normalize the histogram, or else a larger image with more feature descriptors will look very different from a smaller version of the same image. You can use Matlab’s histogram () function for that.

Task 3 [6 points]. Recognition and performance reporting: Write a function,

my_nn_classifier (), which will predict the class label for every test image by finding the training image with most similar features. Therefore, input parameters to this function can be the BoVW representations of all training and test images and the labels of the training images. The function will output the class labels of the test images. You can use Matab’s KNN classifier (fitcknn()). Start with K=3 and go up to K=7 for the best performance (unless the computation becomes very heavy). You can use any Matlab

function for distance measurement, comparison, and sorting.

The performance of your classifier is calculated as:

Score = (# of correct classification)/(Total # of samples)

Write a function my_score() which will report the performance score for each of the 15 classes and generate a confusion matrix. A discussion on confusion matrix can be found here: https://en.wikipedia.org/wiki/Confusion_matrix. Expect an overall score of

approximately 40%. But you can achieve up to 60 % with a good descriptor, a well-designed K-NN classifier, and a suitable distance metric.

Task 4 [12 points]. Report and code: The purpose of the report is to describe the work that you did in your code. The report should discuss in detail how you accomplished the three tasks, with graphics, equations, and code snippet, as necessary. Discuss all parameters choices, what worked and what did not. Your code should be well-commented and run out-of-the- box. You will lose points if your code generates results but your report does not discuss that.