闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Final Project

MTH 496 – Machine Learning

Due date: 5:00 pm, Friday, Dec 11, 2020

November 17, 2020

Note 1 Only select one of the following projects to be the final project. Or, you can do your own projects.

Note 2 If you do a final presentation, 15 points will be added to the final grades (100 pts in total). Limited opportunities. Email me before Nov 19, if you plan to have a presentation.

Project 1. There are 29 datasets generated from real applications. Choose one dataset as the target training/test set and use the methods covered in this semester to study the dataset. Finally, write a report about your studying of the machine learning method and the target dataset.

• Methods: Use methods provided by the Python sklearn package to test the target dataset. Methods should include linear regression, logistic regression, k nearest neighbors, k means,support vector machine (SVM) with kernels, random forest, and gradient boosting decision tree. You should discuss these methods’ accuracy (RMSE, Kendall tau, etc.), performance (cputime, etc.), and applicability (classification or regression) on the dataset.

1. Linear regression: regression method

scikit learn webpage: class sklearn.linear model.LinearRegression()

from sklearn.linear model import LinearRegression

LR = LinearRegression()

LR.fit(X train, y train)

pred = LR.predict(X text)

2. Logistic regression: classification method

scikit learn webpage: class sklearn.linear model.LogisticRegression()

from sklearn.linear model import LogisticRegression

LogisticR = LogisticRegression()

LogisticR.fit(X train, y train)

pred = LogisticR.predict(X text)

3. k-NN: regression and classification method

scikit learn webpage: class sklearn.neighbors.KNeighborsRegressor() and class sklearn.neighbors.KNeighborsClassifier()

from sklearn.neighbors import KNeighborsRegressor

KNNr = KNeighborsRegressor(n neighbors = 3)

KNNr.fit(X train, y train)

pred = KNNr.predict(X text)

from sklearn.neighbors import KNeighborsClassifier

KNNc = KNeighborsClassifier(n neighbors = 3)

KNNc.fit(X train, y train)

pred = KNNc.predict(X text)

4. Naive Bayes: classification method

scikit learn webpage: Naive Bayes

from sklearn.naive bayes import GaussianNB

gnb = GaussianNB()

from sklearn.naive bayes import MultinomialNB

from sklearn.naive bayes import ComplementNB

from sklearn.naive bayes import BernoulliNB

from sklearn.naive bayes import CategoricalNB

5. K-Means: classification method, unsupervised

scikit learn webpage: sklearn.cluster.KMeans()

from sklearn.cluster import KMeans

kmeans = KMeans(n cluster=3, random state=0)

kmeans.fit(X train, y train)

pred = kmeans.predict(X text)

6. SVM: regression and classification method

scikit learn webpage: sklearn.svm.SVC and sklearn.svm.SVR

Parameters: kernel : {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘’precomputed’}, default=‘rbf’

Support Vector Machines examples

7. Random Forest: regression and classification method

scikit learn webpage: class sklearn.ensemble.RandomForestRegressor and class sklearn.ensemble.RandomForestClassifier

Parameters:

➔ n estimators: default=100

➔ max depth: default=None

➔ min samples split: default=2

➔ criterion: {‘mse’, ‘mae’} for regressor and {‘gini’, ‘entropy’} for classifier

➔ min samples leaf: default=1

check the webpage for examples

8. Gradient Boosting Decision Tree: regression and classification method scikit learn webpage: class sklearn.ensemble.GradientBoostingRegressor and class sklearn.ensemble.GradientBoostingClassifier

Parameters:

➔ loss: {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, default=‘ls’ for regressor and {‘deviance’, ‘ex-ponential’}, default=‘deviance’ for classifier

➔ learning rate: default=0.1

➔ n estimators: default=100

➔ max depth: default=3

➔ min samples split: default=2

➔ criterion: {‘friedman mse”, ‘mse’, ‘mae’} default=‘friedman mse’

➔ min samples leaf: default=1

Examples for gradient boosting decision tree

9. Error analysis:

➔ sklearn.metrics.mean squared error

➔ scipy.stats.peasonr

➔ scipy.stats.kendalltau

➔ from sklearn.metrics import accuracy score

Note: The parameter setting should be discussed.

• Datasets: 1. Wine Quality, 2. Bike Data, 3. Kobe Bryant, 4. West Nile virus, 5. Year Music prediction, 6. Income classification, 7. Bank marketing, 8. Credit card clients, 9. Drug consumption, 10. Geographical original of music, 11. HTRU2, 12. Letter recognition, 13. Mice protein expression, 14. Occupancy detection, 15. Online news popularity, 16. Ozone level detection, 17. Phishing websites, 18. Polish companies bankruptcy, 19. Spambase, 20. Taxi Newyork, 21. Allstate, 22. House prices, 23. Shelter Animal outcomes, 24. Weather prediction, 25. Binding energy, 26. Toxicity, 27. Slvation, 28. Coil-20, 29. USPS

• Report: 8-12 slides to illustrate the results and analysis.

• Collaboration: Collaboration is allowed but the datset is only for individuals.

• Final presentation: Not required. 15 points added to the final grades, if you do a presen-tation.

Project 2. 20 minutes presentation about following topics.

• Topics: 1. Convolutional neural network (CNN), 2. Generative adversarial network (GAN), 3. Recurrent neural network (RNN), 4. Boltzmann Machine, 5. Long short term memory, 6. Reinforcement learning