关键词 > NeuralNetworks

Assessment 3: Project

发布时间：2024-06-19

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assessment 3: Project

Assessment 3: Overview

Weight - 20%

Due - End of Week 6, Sunday 10.00 pm (Sydney time) Expected time

Allow approximately 25–30 hours to complete this assessment. Please note that the estimated

workload to complete the assessment may vary depending on the level of your technical background. What you need

The required software for the modelling in part A is available on the slide displayed underneath this overview slide.

Instructions

Part A) You will complete this part of the assessment in Ed. Choose one option from two options given

1. Option I: Foundations of Neural Networks

2. Option II: Tree and Ensemble Learning

You can use Python or R, or both depending on whatever is suitable. You can also use Python notebook and R Markdown for coding.

You are free to use your own IDE and PC. You just need to upload screenshots of code if you are not using Ed to run the code.

Part B) Write a report to describe the steps performed to develop the model and evaluate its

performance. Provide written justiﬁcations, with clearly articulated reasons, for the steps you took to build the model.

How to submil.

Part A (code and data) is submitted via the Ed learning platform. Part B (pdf report) is submitted through Turnitin on the Assessment submission page in Moodle.

In Ed, your can use model.py or model.r for the main code which should read data and run. Alternatively, you can also upload your code notebook. Upload Screenshots of Console if code does not run on Ed and you are using local machine to run the code.

We recommend that you use Scikit-learn in case of Python since it runs faster on Ed.

Do not include any code in your report that will be submitted to Moodle.

Marking and feedback

A rubric is available on Moodle assessment page as well as at the botton of the lesson here on Ed. Feedback and results will be provided to you 7-10 days (approximately) after the deadline. Please note that due to a large number of students, the feedback could be delayed.

FAQ

See Ed Discussions.

Option I - A: Foundations of Neural Networks

Imagine yourself as a data scientist and build a neural network model using a given datasets. Data set

Refer to the respective documentation for the given data set is given with the following link:

"Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem." Source

http://archive.ics.uci.edu/ml/datasets/Abalone

TEXT

Name / Data Type / Measurement Unit / Description

Sex / nominal / -- / M, F, and I (infant)

Length / continuous / mm / Longest shell measurement

Diameter / continuous / mm / perpendicular to length

Height / continuous / mm / with meat in shell

Whole weight / continuous / grams / whole abalone

Shucked weight / continuous / grams / weight of meat

Viscera weight / continuous / grams / gut weight (after bleeding)

Shell weight / continuous / grams / after being dried

Rings / integer / -- / gives the age in years

The readme ﬁle contains attribute statistics.

Instructions

Clean the above data sets with data processing code and then prepare them for modelling using (dense) neural networks. Build a neural network model either using Keras or scikit-learn in Python or R. Understand the given problem and identify the respective inputs and outputs of the proposed model.

converting in to classes

You treat the project as a classiﬁcation problem. You show results for the ring age classiﬁed into 4 major groups, i.e. 4 output neurons using the following ring age groups:

. Class 1: 0 - 7 years

. Class 2: 8- 10 years

. Class 3: 11 - 15 years

· Class 4: Greater than 15 years

Include class distribution as part of the data visualization in Step 1 below.

Note that the response variable is continuous. However, in this assessment, the problem is a classiﬁcation problem with four classes.

Steps to execute the project

Consider the following steps to build and evaluate the model:

1. Analyse and visualise the given datasets by reporting the distribution of classes, distribution of features and any other visualisation you ﬁnd appropriate.

2. Develop a dense neural network with one hidden layer. Vary the number of hidden neurons to be 5, 10, 15, and 20 in order to investigate the performance of the model using Stochastic

Gradient Descent (SGD). Determine the optimal number of neurons in the hidden layer from the range of values considered.

3. Investigate the eﬀect of learning rate (using SGD) for the selected dataset (using the optimal number of hidden neurons).

4. Investigate the eﬀect on a diferent number of hidden layers: Now modify the model by adding another hidden layer. Use the optimal number of hidden neurons from Step 3 for both the

layers and the optimal learning rate from Step 4. Investigate the eﬀect of this change in the number of hidden layers (using SGD).

5. Investigate the eﬀect of Adam and SGD on training and test performance.

6. Take the ﬁnal optimal model among all the above cases and show the confusion matrix and ROC/AUC curve for diﬀerent classes of the multi-class problem.

Evaluate the optimal* model using the classiﬁcation accuracy score on test data.

Note that Step 2 to 5 require 10 experimental runs (with diﬀerent initial weights) for each case where you report mean and 95% conﬁdence interval of accuracy. You need to select the appropriate metrics, i.e., for classiﬁcation report performance on the train and test datasets. Use 60/40 percent train/test split for given data set (data split remains ﬁxed across experiments). Note that there is no need to have a validation set.

Additional tasks (not a requirement and no extra marks will be given)

· You can also feature additional visualisation such as error plots on the train and test split for optimal model over time and any other visualisations for the training/test performance.

· Using Adam/SGD, compare L2 regularisation (weight decay) to dropout for selected hyper- parameters. Then compare with Adam with no regularisation.

· Hybrid Dropout and Weight Decay: Using Adam, compare L2 regularisation (weight decay) with dropouts. Show results for 3 diﬀerent combinations (can be more) of hyperparameters

(dropout rate with weight decay hyper-parameter (λ) )

Installation: You should install required libraries and run the experiments on your personal computers and upload the results/code on Ed later. Note that the code will not be evaluated. Marks will be given only for your report. You can also submit a readme.txt with your submission that gives an overview of your ﬁles/code. The reason we need your code is for plagiarism check in case if we are suspicious about your report.

How to submit

Click on the submit button to submit your code.

We recommend that you use Scikit-learn in case of Python since it runs faster on Ed.

DISCUSSION

What is a good model? you need to decide that with trial runs, i.e, how many iterations are needed to get good performance on the train and test datasets. You can make convergence plots and then decide what is the best time to stop training.

Option I - B: Report Task

Write a report to describe the steps performed in Part A to develop the model and evaluate its performance.

Provide brief justiﬁcations, with clearly articulated reasons, for the steps you took to build the model you submitted. Please note that you are free to use your own writing style and should provide references as needed. The following suggestions/guidelines are not mandatory and are provided mainly for informational purposes.

Suggestions/Guidelines for Presentation style/format

. Use IEEE Conference paper template in Latex or Word:

https://www.ieee.org/conferences/publishing/templates.html

https://www.overleaf.com/latex/templates/ieee-conference-template-example/nsncsyjfmpxy

. Your report should have the following sections: Problem deﬁnition (abstract and Introduction) and methodology, results, and conclusion. To get more information on these sections, click

here. You are encouraged to cite at least 10 references in your technical report. Note that

introduction highlights literature, aim and goals and the general problem you are trying to

solve. You are free to use diﬀerent section title or report style, although this style of reporting is encouraged.

. Quantitative information should be clearly described and appropriately communicated (e.g. using ﬁgures and tables that are appropriately labelled).

. There is no strict word limit.

. Your report should be written using correct spelling, grammar, and punctuation. . Follow IEEE referencing style.

. You need to submit code that runs in Ed. If your report takes into account N=10 experiments for example, but your code submitted needs to have N=1 in the for loop that repeats the

experiments. You should use functions/methods. You can upload results that are used to

generate the results - plots that will be part of your report and keep plots as a separate ﬁle code if you wish.

. You can also submit a readme.txt with your submission that gives an overview of your

ﬁles/code. Writing tips:

https://users.ics.aalto.ﬁ/ntatti/howtowrite2016/tutorial.pdf

https://www.sciencedirect.com/science/article/pii/S1878764915001606

How to submit

This assessment is submitted through Turnitin on the Assessment submission page in Moodle.

Do not include any code in your report that will be submitted to Moodle.

Option II - A: Tree and Ensemble Learning

This option will feature components from decision trees, random forests, and ensemble learning.

Use the Abalone dataset given in Part A: Option I. Now you need to apply CART for the same problem and report the classiﬁcation performance on the train and test set using the same train/test split. In addition to task 1 of Option I-A on analysis and visualization of the given datasets, execute the following tasks.

1. Report the Tree Visualisation (show your tree and also translate few selected nodes and leaves into IF and THEN rules)

2. Do an investigation about improving performance further by either pre-pruning or post-pruning the tree: https://scikit-learn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html

3. Apply Bagging of Trees via Random Forests and show performance (e.g., accuracy score) as

your number of trees in the ensembles increases. Carry out 10 experiments (minimum of 2 experiments is ﬁne withdiﬀerent random states in train/test split) in Task 2 and 3 and show performance accuracy with mean and conﬁdence interval. Note that Task 2 may have same results for every experimental run.

4. Optional: Compare results with Adam and SGD (Neural Networks) and discuss them.

Note that performance refers to accuracy which could be either classiﬁcation accuracy or F1 score.

Abalone dataset from UCI ML repository: https://archive.ics.uci.edu/ml/datasets/Abalone. In this case, provide visualisation and analysis of your data ﬁrst, as required in Part A: Option I.

Click on the submit button to submit your code.

We recommend that you use Scikit-learn in case of Python since it runs faster on Ed.

Option II - B: Report Task

Write a report to describe the steps performed in Part A to develop the model and evaluate its performance.

references as needed. The following suggestions/guidelines are not mandatory and are provided mainly for informational purposes.

Suggestions/Guidelines for Presentation style/format

. Use IEEE Conference paper template in Latex or Word:

https://www.ieee.org/conferences/publishing/templates.html

https://www.overleaf.com/latex/templates/ieee-conference-template-example/nsncsyjfmpxy

. Your report should have the following sections: Problem deﬁnition (abstract and Introduction) and methodology, results, and conclusion. To get more information on these sections, click

here. You are encouraged to cite at least 10 references in your technical report. Note that

introduction highlights literature, aim and goals and the general problem you are trying to

solve. You are free to use diﬀerent section title or report style, although this style of reporting is encouraged.

. Quantitative information should be clearly described and appropriately communicated (e.g. using ﬁgures and tables that are appropriately labelled).

. There is no strict word limit.

. Your report should be written using correct spelling, grammar, and punctuation. . Follow IEEE referencing style.

. You need to submit code that runs in Ed. If your report takes into account N=10 experiments for example, but your code submitted needs to have N=1 in the for loop that repeats the

experiments. You should use functions/methods. You can upload results that are used to

generate the results - plots that will be part of your report and keep plots as a separate ﬁle code if you wish.

. You can also submit a readme.txt with your submission that gives an overview of your

ﬁles/code. Writing tips:

https://users.ics.aalto.ﬁ/ntatti/howtowrite2016/tutorial.pdf

https://www.sciencedirect.com/science/article/pii/S1878764915001606

How to submit

This assessment is submitted through Turnitin on the Assessment submission page in Moodle.

Do not include any code in your report that will be submitted to Moodle.

Evaluation - Rubrics

A. Overall presentation

. The report has an excellent presentation. The introduction clearly deﬁnes the aim and goals of the report with a clear review of the literature. Results and discussion section has been presented very well. (25 %)

. The report has a good presentation. The introduction clearly deﬁnes the aim and goals of the report. Results and discussion section has been presented well but some issues present. (20%)

. The report has some presentation issues. The introduction does not clearly deﬁne the aim and goals of the report. Results and discussion section has not been presented very well. (15 %)

. The report has a poor presentation. The introduction has missing aim and goals. Results and discussion section is questionable or not complete. (10 %)

. No submission/results not correct (0 %)

B. Depth of discussion and presentation of results

· In-depth discussion & elaboration in all sections of the report. (25 %)

. In-depth discussion & elaboration in most sections of the report. (20 %)

. The writer has omitted pertinent content or content runs-on excessively. (15 %)

. Cursory discussion in all the sections of the report or brief discussion in only a few sections. (10 %)

. No submission/results. (0 %)

C. Cohesiveness

. Ties together information from all sources. Report ﬂows from one issue to the next clearly.

Author's writing demonstrates an understanding of the relationship among material obtained from all sources. (25 %)

. For the most part, ties together information from all sources. Report ﬂows with only some disjointedness. Author's writing demonstrates an understanding of the relationship among material obtained from all sources. (20 %)

. Sometimes ties together information from all sources. Report does not ﬂ ow - disjointedness is apparent. Author's writing does not demonstrate an understanding of the relationship among material obtained from all sources. (15 %)

. Does not tie together information. Report does not ﬂow and appears to be created from disparate issues. Headings are necessary to link concepts. Writing does not demonstrate understanding any relationships (10%)

. Not coherent/ no submission (0 %)

D. Sources and citations

. Relevant sources cited and provides a proper overview of sources in the text. (25 %)

. Relevant sources cited and provides a proper overview of sources in discussion but some minor issues present. (20 %)

. Sources are credible, however, mistakes in citations. (15 %)

. Does not cite and discuss the source properly. (10 %)

· No citations/submission (0%)

Note that plagiarism will automatically imply 0 marks.

Adapted from: https://www.cornellcollege.edu/library/faculty/focusing-on-assignments/tools-for- assessment/research-paper-rubric.shtml

Do not include any code in your report that will be submitted to Moodle.