闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP809 – Data Mining and Machine Learning

Lab 7 – Classifier Models

Ø Two major objectives of this lab are to

o configure Python’s implementation of some of the most widely used classifiers

o to evaluate these classifiers using a variety of different metrics.

Ø Configuring classifiers will be achieved using Python’s sklearn library. Evaluation will also be done via sklearn but use a dedicated set ofmethods designed specifically for computing mterics.

1. Importing the libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split,

cross_val_score

from sklearn.tree import DecisionTreeClassifier

from sklearn.naive_bayes import GaussianNB, MultinomialNB

from sklearn.metrics import accuracy_score

from sklearn.neighbors import KNeighborsClassifier

from sklearn.neural_network import MLPClassifier

from sklearn.metrics import precision_score, recall_score,auc

from sklearn.metrics import roc_curve,roc_auc_score,

plot_roc_curve

2. Loading and inspecting the dataset

path="P:\COMP809\Iris.xlsx" #should change the path

accordingly

rawdata= pd.read_excel(path) #pip install xlrd

print ("data summary")

print (rawdata.describe())

nrow, ncol = rawdata.shape

print (nrow, ncol)

print ("\n correlation Matrix")

print (rawdata.corr())

rawdata.hist()

plt.show()

3. Display correlations between all pairs of features

pd.plotting.scatter_matrix(rawdata,figsize=[8,8])

plt.show()

# boxplot

fig = plt.figure(1, figsize=(9, 6))

ax = fig.add_subplot(111)

ax.boxplot(rawdata.values)

ax.set_xticklabels(['Petal Length', 'Petal Width', 'Sepal

Length', 'Sepal Width', 'Class'])

plt.show()

4. Get the predictors – all columns from 0 to last but one

predictors = rawdata.iloc[:,:ncol-1]

print(predictors)

#index to last column to obtain class values

target = rawdata.iloc[:,-1]

print(target)

5. Partition data using a train/test split

By referring to https://scikit-learn.org/0. 16/modules/generated/sklearn.cross_validation.train_test_split.html complete the right-hand side of the line below and set the training set size to 70% of the size of the dataset.

pred_train, pred_test, tar_train, tar_test =

train_test_split()

6. Configure the Decision Tree Classifier

split_threshold=4

fpr = dict() # store false positive

rate in a dictionary object

tpr = dict() # likewise, store the true positive rate

roc_auc = dict()

By referring to

https://scikit-

learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.Dec isionTreeClassifier set the entropy criterion for splitting and set the minimum no of samples (objects) for splitting a decision node (Note: This threshold should be greater than 1).