MTH 496 – Machine Learning
Final Project
MTH 496 – Machine Learning
Due date: 5:00 pm, Friday, Dec 11, 2020
November 17, 2020
Note 1 Only select one of the following projects to be the final project. Or, you can do your own projects.
Note 2 If you do a final presentation, 15 points will be added to the final grades (100 pts in total). Limited opportunities. Email me before Nov 19, if you plan to have a presentation.
Project 1. There are 29 datasets generated from real applications. Choose one dataset as the target training/test set and use the methods covered in this semester to study the dataset. Finally, write a report about your studying of the machine learning method and the target dataset.
• Methods: Use methods provided by the Python sklearn package to test the target dataset. Methods should include linear regression, logistic regression, k nearest neighbors, k means,support vector machine (SVM) with kernels, random forest, and gradient boosting decision tree. You should discuss these methods’ accuracy (RMSE, Kendall tau, etc.), performance (cputime, etc.), and applicability (classification or regression) on the dataset.
1. Linear regression: regression method
scikit learn webpage: class sklearn.linear model.LinearRegression()
from sklearn.linear model import LinearRegression
LR = LinearRegression()
LR.fit(X train, y train)
pred = LR.predict(X text)
2. Logistic regression: classification method
scikit learn webpage: class sklearn.linear model.LogisticRegression()
from sklearn.linear model import LogisticRegression
LogisticR = LogisticRegression()
LogisticR.fit(X train, y train)
pred = LogisticR.predict(X text)
3. k-NN: regression and classification method
scikit learn webpage: class sklearn.neighbors.KNeighborsRegressor() and class sklearn.neighbors.KNeighborsClassifier()
from sklearn.neighbors import KNeighborsRegressor
KNNr = KNeighborsRegressor(n neighbors = 3)
KNNr.fit(X train, y train)
pred = KNNr.predict(X text)
from sklearn.neighbors import KNeighborsClassifier
KNNc = KNeighborsClassifier(n neighbors = 3)
KNNc.fit(X train, y train)
pred = KNNc.predict(X text)
4. Naive Bayes: classification method
scikit learn webpage: Naive Bayes
from sklearn.naive bayes import GaussianNB
gnb = GaussianNB()
from sklearn.naive bayes import MultinomialNB
from sklearn.naive bayes import ComplementNB
from sklearn.naive bayes import BernoulliNB
from sklearn.naive bayes import CategoricalNB
5. K-Means: classification method, unsupervised
scikit learn webpage: sklearn.cluster.KMeans()
from sklearn.cluster import KMeans
kmeans = KMeans(n cluster=3, random state=0)
kmeans.fit(X train, y train)
pred = kmeans.predict(X text)
6. SVM: regression and classification method
scikit learn webpage: sklearn.svm.SVC and sklearn.svm.SVR
Parameters: kernel : {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘’precomputed’}, default=‘rbf’
Support Vector Machines examples
7. Random Forest: regression and classification method
scikit learn webpage: class sklearn.ensemble.RandomForestRegressor and class sklearn.ensemble.RandomForestClassifier
Parameters:
➔ n estimators: default=100
➔ max depth: default=None
➔ min samples split: default=2
➔ criterion: {‘mse’, ‘mae’} for regressor and {‘gini’, ‘entropy’} for classifier
➔ min samples leaf: default=1
check the webpage for examples
8. Gradient Boosting Decision Tree: regression and classification method scikit learn webpage: class sklearn.ensemble.GradientBoostingRegressor and class sklearn.ensemble.GradientBoostingClassifier
Parameters:
➔ loss: {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, default=‘ls’ for regressor and {‘deviance’, ‘ex-ponential’}, default=‘deviance’ for classifier
➔ learning rate: default=0.1
➔ n estimators: default=100
➔ max depth: default=3
➔ min samples split: default=2
➔ criterion: {‘friedman mse”, ‘mse’, ‘mae’} default=‘friedman mse’
➔ min samples leaf: default=1
Examples for gradient boosting decision tree
9. Error analysis:
➔ sklearn.metrics.mean squared error
➔ scipy.stats.peasonr
➔ scipy.stats.kendalltau
➔ from sklearn.metrics import accuracy score
Note: The parameter setting should be discussed.
• Datasets: 1. Wine Quality, 2. Bike Data, 3. Kobe Bryant, 4. West Nile virus, 5. Year Music prediction, 6. Income classification, 7. Bank marketing, 8. Credit card clients, 9. Drug consumption, 10. Geographical original of music, 11. HTRU2, 12. Letter recognition, 13. Mice protein expression, 14. Occupancy detection, 15. Online news popularity, 16. Ozone level detection, 17. Phishing websites, 18. Polish companies bankruptcy, 19. Spambase, 20. Taxi Newyork, 21. Allstate, 22. House prices, 23. Shelter Animal outcomes, 24. Weather prediction, 25. Binding energy, 26. Toxicity, 27. Slvation, 28. Coil-20, 29. USPS
• Report: 8-12 slides to illustrate the results and analysis.
• Collaboration: Collaboration is allowed but the datset is only for individuals.
• Final presentation: Not required. 15 points added to the final grades, if you do a presen-tation.
Project 2. 20 minutes presentation about following topics.
• Topics: 1. Convolutional neural network (CNN), 2. Generative adversarial network (GAN), 3. Recurrent neural network (RNN), 4. Boltzmann Machine, 5. Long short term memory, 6. Reinforcement learning
• Final presentation: Required.
2021-03-03