Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP10082 Machine Learning for Data Analytics

Coursework (Semester 1: 2023)

Component 1 – Portfolio of Practical Work (Weighting 50%)

Due Date 4th  December 2023

Overview

This component consists of a portfolio of the coursework students have completed during the module.

Submission

Students are required to submit the following.

Certificates for the following Kaggle Courses

1.    Intro to Machine Learning

2.    Intermediate Machine Learning

3.    Pandas

4.    Data Cleaning

Completed Jupyter Notebooks

1.    Week 4 Part 3:

Multiple Regression Exercise from scratch

2.    Week 6 Part 2:

Classification with SVM

3.    Week 8 Part 2:

k-means clustering from scratch

4.    Week 10 Part 3:

Regression model with Keras (Creating your own models)

Component 2 – Malware Classification and Report (Weighting 50%)

Due Date 5th  January 2024

Overview

Download the Drebindataset from Moodle. The drebin-215-dataset contains 215 software attributes extracted from over 15,000 Android applications along with a classification in the final column, ‘class’. The dataset-features-categories file provides a descriptor of each value in the vector and an explanation of the class.

Students should read the data into a Jupyter notebook and apply a variety of classification models (minimum of 3) to the data supported by a critical analysis of each of the approaches taken. Your analysis should explain your approach and give an overview with the rationale and mathematical basis for each model you have applied. You should support your findings with empirical observations from your models and appropriate data visualisations. Your analysis should identify the strengths and weaknesses of each approach and you should conclude by recommending a classification technique based on the work you have conducted and your empirical observations.

Submission

You should submit asingle Jupyter notebook which contains both your models and analysis.

Marking Schemes

Component 1 - Portfolio

Kaggle Course Certificates

Each certificate is worth 10% of the marks for this component and will be marked as either a pass or fail. (Maximum of 40%).

Jupyter Notebooks

Each notebook is worth 15% of the marks for this component. It is expected that notebooks will be annotated with a description of and rationale for the steps taken and that results presented are fully explained. (Maximum of 60%).

Grade

Jupyter Notebook

A1 No improvement possible

A2

All notebooks are completed to outstanding

standards reflecting a deep understanding of the models created.  The code produced is

professionally structured and makes full use of the techniques introduced on the module

A3

Fully completed to a professional standard; a   clear understanding of the models created is    obvious; The code produced is well structured and makes appropriate use of the techniques  introduced on the module

B1

Completed to a high standard; an

understanding of the models created is

obvious. The code produced is structured and makes appropriate use of the techniques

introduced on the module

B2

Models are created and largely correct; Code is structured but may have omissions or lack

clarity

C

Models are created and mainly correct; Code is structured but may have omissions or lack

clarity.

D

Models are created and all elements run

although some output maybe incorrect; Code is structured but may have omissions or lack    clarity.

E

Models do not run in the notebook or output is  largely incorrect. Code produced lacks structure

N No attempt

Component 2 - Malware Classification and Report

Grade

Jupyter Notebook

Analysis

A1 No improvement possible

 

A2

All notebooks are completed to outstanding standards

reflecting a deep

understanding of the models    created.  The code produced is professionally structured and    makes full use of the

techniques introduced on the module

Analysis is insightful and fully    supported by metrics and data visualisations produced from    the models created. A full

understanding of the

mathematical principles

supporting the work is

demonstrated.

A3

Fully completed to a

professional standard; a clear understanding of the models  created is obvious; The code   produced is well structured

and makes appropriate use of the techniques introduced on the module

Analysis is clear and well

supported by metrics and data visualisations produced from    the models created. A good

understanding of the

mathematical principles

supporting the work is

demonstrated

B1

Completed to a high standard; an understanding of the

models created is obvious. The code produced is structured

and makes appropriate use of the techniques introduced on the module

Analysis is supported by

metrics and data visualisations produced from the models

created. An understanding of the mathematical principles   supporting the work is

demonstrated

B2

Models are created and largely correct; Code is structured but  may have omissions or lack

clarity

Analysis is supported by

metrics and data visualisations produced from the models

created. Some understanding   of the mathematical principles supporting the work is

demonstrated

C

Models are created and mainly correct; Code is structured but  may have omissions or lack

clarity.

Analysis is partially supported by metrics and data

visualisations produced from the models created. Some

understanding of the

 

 

mathematical principles

supporting the work is

demonstrated

D

Models are created and all

elements run although some    output maybe incorrect; Code is structured but may have

omissions or lack clarity.

Some metrics and data

visualisations are produced   from the models created. A   limited understanding of the mathematical principles

supporting the work is

demonstrated

E

Models do not run in the

notebook or output is largely    incorrect. Code produced lacks structure

Limited or no output. No    theoretical support for the models is included.

N No attempt