Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DS 310 Machine Learning

Fall 22

Problem Set 2: Nearest Neighbor Classifier

1.  (25 pts.) Suppose a 5 nearest neighbor search for a query sample q returns {7, 8, 7, 8, 9} as the values of the function at the 5 nearest data samples in the training set. Assuming prediction using 5 nearest neighbor interpolation, what is the predicted value of the function for the query sample?

2.  (25 pts.) Consider the following database of patients with specific symptoms diagnosed

by a physician.      Training Example

1

2

3

4

5

6

7

Fever

Nausea

Diarrhea

Chills

no

no

no

no

average

no

no

no

high

no

no

yes

high

yes

yes

no

average

no

yes

no

no

yes

yes

no

average

yes

yes

no

Classification

Healthy

Flu

Flu

Salmonella

Salmonella

IBD

IBD

(a) Design a suitable representation for this data set.

(b) Define a suitable distance measure.

(c)  Calculate the distance of a query instance (high, no, no, no) from all of the training data points.

(d)  Classify the preceding query instance using 3-nearest-neighbor classifier trained on the training data.

3.  (25 pts.) Suppose you are given a data set consisting of text documents that have been labeled as belonging to one of two classes (+ denoting interesting” and denoting ”not interesting”). Suppose the vocabulary used in the document consists of V distinct words w1 ··· wV . Suppose you want to construct a nearest-neighbor classifier from this data set.

(a) Design a suitable representation for documents that is amenable for use with a nearest-neighbor classifier.

(b) Define a suitable distance measure to use with a nearest-neighbor classifier.

4.  (25 pts.) Suppose you are given document collection where each document consists of both text and pictures. Suppose each document has been labeled as belonging to one of two classes (+ denoting interesting” and denoting not interesting”). Suppose the vocabulary used in the document consists of V distinct words w1 ··· wV . Suppose the pictures are made of a visual vocabulary of F distinct identifiable features. Suppose you want to construct a nearest-neighbor classifier from this data set.  How would you go about designing a nearest-neighbor classifier in this case? Explain with pseudo-code.