Final Project

MTH 496 – Machine Learning

Due date: 5:00 pm, Friday, Dec 11, 2020

November 17, 2020


      Note 1 Only select one of the following projects to be the final project. Or, you can do your own projects.

      Note 2 If you do a final presentation, 15 points will be added to the final grades (100 pts in total). Limited opportunities. Email me before Nov 19, if you plan to have a presentation.


Project 1. There are 29 datasets generated from real applications. Choose one dataset as the target training/test set and use the methods covered in this semester to study the dataset. Finally, write a report about your studying of the machine learning method and the target dataset.

Methods: Use methods provided by the Python sklearn package to test the target dataset. Methods should include linear regression, logistic regression, k nearest neighbors, k means,support vector machine (SVM) with kernels, random forest,  and gradient boosting decision tree. You should discuss these methods’ accuracy (RMSE, Kendall tau, etc.), performance (cputime, etc.), and applicability (classification or regression) on the dataset.

1. Linear regression: regression method

    scikit learn webpage: class sklearn.linear model.LinearRegression()


    from sklearn.linear model import LinearRegression

    LR = LinearRegression()

    LR.fit(X train, y train)

    pred = LR.predict(X text)


2. Logistic regression: classification method

    scikit learn webpage: class sklearn.linear model.LogisticRegression()


    from sklearn.linear model import LogisticRegression

    LogisticR = LogisticRegression()

    LogisticR.fit(X train, y train)

    pred = LogisticR.predict(X text)


3. k-NN: regression and classification method

    scikit learn webpage: class sklearn.neighbors.KNeighborsRegressor() and class sklearn.neighbors.KNeighborsClassifier()


    from sklearn.neighbors import KNeighborsRegressor

    KNNr = KNeighborsRegressor(n neighbors = 3)

    KNNr.fit(X train, y train)

    pred = KNNr.predict(X text)


    from sklearn.neighbors import KNeighborsClassifier

    KNNc = KNeighborsClassifier(n neighbors = 3)

    KNNc.fit(X train, y train)

    pred = KNNc.predict(X text)


4. Naive Bayes: classification method

    scikit learn webpage: Naive Bayes


    from sklearn.naive bayes import GaussianNB

    gnb = GaussianNB()

    from sklearn.naive bayes import MultinomialNB

    from sklearn.naive bayes import ComplementNB

    from sklearn.naive bayes import BernoulliNB

    from sklearn.naive bayes import CategoricalNB


5. K-Means: classification method, unsupervised

    scikit learn webpage: sklearn.cluster.KMeans()


    from sklearn.cluster import KMeans

    kmeans = KMeans(n cluster=3, random state=0)

    kmeans.fit(X train, y train)

    pred = kmeans.predict(X text)


6. SVM: regression and classification method

    scikit learn webpage: sklearn.svm.SVC and sklearn.svm.SVR

    Parameters: kernel : {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘’precomputed’}, default=‘rbf’

    Support Vector Machines examples


7. Random Forest: regression and classification method

    scikit learn webpage: class sklearn.ensemble.RandomForestRegressor and class sklearn.ensemble.RandomForestClassifier

    Parameters:

    ➔ n estimators: default=100

    ➔ max depth: default=None

    ➔ min samples split: default=2

    ➔ criterion: {‘mse’, ‘mae’} for regressor and {‘gini’, ‘entropy’} for classifier

    ➔ min samples leaf: default=1

    check the webpage for examples


8. Gradient Boosting Decision Tree: regression and classification method scikit learn webpage: class sklearn.ensemble.GradientBoostingRegressor and class sklearn.ensemble.GradientBoostingClassifier

    Parameters:

    ➔ loss: {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, default=‘ls’ for regressor and {‘deviance’, ‘ex-ponential’}, default=‘deviance’ for classifier

    ➔ learning rate: default=0.1

    ➔ n estimators: default=100

    ➔ max depth: default=3

    ➔ min samples split: default=2

    ➔ criterion: {‘friedman mse”, ‘mse’, ‘mae’} default=‘friedman mse’

    ➔ min samples leaf: default=1

    Examples for gradient boosting decision tree


9. Error analysis:

    ➔ sklearn.metrics.mean squared error

    ➔ scipy.stats.peasonr

    ➔ scipy.stats.kendalltau

    ➔ from sklearn.metrics import accuracy score

Note: The parameter setting should be discussed.

Datasets: 1. Wine Quality, 2. Bike Data, 3. Kobe Bryant, 4. West Nile virus, 5. Year Music prediction, 6. Income classification, 7. Bank marketing, 8. Credit card clients, 9. Drug consumption, 10. Geographical original of music, 11. HTRU2, 12. Letter recognition, 13. Mice protein expression, 14. Occupancy detection, 15. Online news popularity, 16. Ozone level detection, 17. Phishing websites, 18. Polish companies bankruptcy, 19. Spambase, 20. Taxi Newyork, 21. Allstate, 22. House prices, 23. Shelter Animal outcomes, 24. Weather prediction, 25. Binding energy, 26. Toxicity, 27. Slvation, 28. Coil-20, 29. USPS

Report: 8-12 slides to illustrate the results and analysis.

Collaboration: Collaboration is allowed but the datset is only for individuals.

Final presentation: Not required. 15 points added to the final grades, if you do a presen-tation.


Project 2. 20 minutes presentation about following topics.

Topics: 1. Convolutional neural network (CNN), 2. Generative adversarial network (GAN), 3. Recurrent neural network (RNN), 4. Boltzmann Machine, 5. Long short term memory, 6. Reinforcement learning

Final presentation: Required.