DS 310 Machine Learning Fall 22 Problem Set 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DS 310 Machine Learning
Fall 22
Problem Set 2: Nearest Neighbor Classifier
1. (25 pts.) Suppose a 5 nearest neighbor search for a query sample q returns {7, 8, 7, 8, 9} as the values of the function at the 5 nearest data samples in the training set. Assuming prediction using 5 nearest neighbor interpolation, what is the predicted value of the function for the query sample?
2. (25 pts.) Consider the following database of patients with specific symptoms diagnosed
by a physician. Training Example
1
2
3
4
5
6
7
Fever |
Nausea |
Diarrhea |
Chills |
no |
no |
no |
no |
average |
no |
no |
no |
high |
no |
no |
yes |
high |
yes |
yes |
no |
average |
no |
yes |
no |
no |
yes |
yes |
no |
average |
yes |
yes |
no |
Classification
Healthy
Flu
Flu
Salmonella
Salmonella
IBD
IBD
(a) Design a suitable representation for this data set.
(b) Define a suitable distance measure.
(c) Calculate the distance of a query instance (high, no, no, no) from all of the training data points.
(d) Classify the preceding query instance using 3-nearest-neighbor classifier trained on the training data.
3. (25 pts.) Suppose you are given a data set consisting of text documents that have been labeled as belonging to one of two classes (+ denoting ”interesting” and − denoting ”not interesting”). Suppose the vocabulary used in the document consists of V distinct words w1 ··· wV . Suppose you want to construct a nearest-neighbor classifier from this data set.
(a) Design a suitable representation for documents that is amenable for use with a nearest-neighbor classifier.
(b) Define a suitable distance measure to use with a nearest-neighbor classifier.
4. (25 pts.) Suppose you are given document collection where each document consists of both text and pictures. Suppose each document has been labeled as belonging to one of two classes (+ denoting ”interesting” and − denoting ”not interesting”). Suppose the vocabulary used in the document consists of V distinct words w1 ··· wV . Suppose the pictures are made of a visual vocabulary of F distinct identifiable features. Suppose you want to construct a nearest-neighbor classifier from this data set. How would you go about designing a nearest-neighbor classifier in this case? Explain with pseudo-code.
2022-09-08