Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Econ 128 – Machine Learning

Take-home Midterm

DUE TUESDAY, Jan 24, 2024 @9pm by email!

Instructions.

.    Please read Ch 2 and 3 in the textbook. Ch 3 covers linear regression and the material should be familiar from Econ 123A.

.    Submit the solutions as a notebook. I should be able to run the notebook as long as I change the path to the main file. If the notebook does not run for any reason you

will be graded ONLY up to the block of code that runs.

.    You can submit solutions individually or as a group. If you submit as a group you will be graded as a group which may impact the overall distribution of grades.

.    Use the SoCalCars dataset from Lab 1

You are asked to use the data to develop an algorithm that predicts whether a car is a “Good Deal”

1.  [10 pts + 10pts for best solution] Develop a linear probability model using regression analysis. You can choose any predictors you want. Compute the R2 and MSE for your model. The person/team who obtains the best predictor in the training data earns an additional 10pts.

2.  [10 pts] Predict the probability that a car is a “good deal”. What fraction of the cars is predicted to have probability <0 or >1?

3.  [20 pts] Compute the threshold probability p* such that if the predicted probability > p* you decide that the car is a good deal using a test dataset equal to 20% of the full dataset. (Choose 5 values and then find the best one of these values. Don’t try to explore all possible values.)

4.  [30 pts] Implement the k-nearest neighbor algorithm to predict when a car is a good deal. Compute the optimal k* using a test dataset equal to 20% of the full dataset.   (Choose 5 values and then find the best one of these values. Don’t try to explore all  possible values.)

5.  [10 pts + 10 pts for best solution] Compare the performance of your regression and KNN algorithms at the values of p* and k* you have chosen in the test dataset.

Report the performance. The person/team with the best performance earns an additional 10 pts.