Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Part 1: Classification

In Part 1 of this assignment, you will be exploring different classification methods on a modified version of a real dataset - the Cardiovascular Disease Dataset. This dataset has a balanced selection of the target class - Cardiovascular disease. The features of the Cardiovascular Disease Dataset are as follows:

Age - positive int (days)

Height - positive int (cm)

Weight - positive float (kg)

Systolic blood pressure - positive int

Diastolic blood pressure -  positive int

Gender - categorical [F, M]

Cholesterol - categorical [normal, above normal, well above normal]

Glucose - categorical [normal, above normal, well above normal]

Smoking - categorical [No, Yes]

Alcohol intake - categorical [No, Yes]

Physical activity - categorical [No, Yes]

Cardiovascular disease - categorical [No, Yes]

Your tasks for this part involve applying pre-processing techniques and writing classification functions that can be applied to this dataset using stratified 10-fold cross-validation.

After providing these functions, your next task is to design and test two functions on a range of different hyperparameters and evaluate their performance with stratified 10-fold cross-validation for bagging and a validation set for Adaboost. You should use the cvkfold provided for all functions when performing cross-validation.

Although it is not always necessary to wrap your code in functions when using Jupyter Notebooks, this allows us to test your implementations. Wherever relevant, pass a random_state argument as 0 to control for randomness between runs and ensure your results are reproducible. Further instructions can be found in the scaffold notebook.

Details on the four tasks of this part are as follows:

1. Pre-processing techniques:

Replacing missing values

Min-max normalisation

2. Classification methods:

K-Nearest Neighbours

Naive Bayes

Decision Trees

Support Vector Machine

3. Ensemble methods:

Bagging

Adaboost

4. Hyperparameter tuning:

Choose appropriate parameters

Test and return the highest-performing parameters for each method

IMPORTANT: Do not remove the ### comments in the scaffold or you will be unable to run the tests. Do not rename your functions and use the same variable names when they are prescribed in the instructions.

During marking, the Notebook.ipynb file will be run cell by cell in order, except for cells containing ### SKIP.

Note on the dataset:

This is a modified version of this dataset. Further details on the original Cardiovascular Disease Dataset can be found at:

 https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset

https://www.kaggle.com/datasets/aiaiaidavid/cardio-data-dv13032020