关键词 > Python代写

Assessment 3: Overview

发布时间:2023-11-18

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assessment 3: Overview

Weight - 25%

Due: Friday Week 10, 10:00 PM

Expected time

Allow approximately 25 -30 hours to complete this assessment. Please note that the estimated workload to complete the assessment may vary depending on the level of your technical background.

What you need

The required software for the modelling in part A is available on the slide displayed underneath this overview slide.

Instructions

Part A) You will complete this part of the assessment in Ed.  Choose one option from three options given

Option I: Foundations of Neural Networks

Option II: Tree and Ensemble Learning

Option III: Machine Learning Research Project

You can use Python or R, or both depending on whatever is suitable. You can also use Python notebook and R Markdown.

You are free to use your own IDE and PC. You just need to upload screenshots of code if you are not using Ed.

Part B) Write a report to describe the steps performed to develop the model and evaluate its

performance. Provide written justifications, with clearly articulated reasons, for the steps you took to build the model.

How to submit

Part A (code and data) is submitted via the Ed learning platform. Part B (report pdf) is submitted through the Assessment submission page in Moodle.

In Ed, our preference is model.py or model.r for the main code which should read data and run. Upload Screenshots of Console if code does not run on Ed and you are using local machine to run the code.

We recommend that you use Scikit-learn in case of Python  since it runs faster on Ed.

In case of a group submission, only one member is required to submit - with names of all group members with email and student IDs on the report.

Do not include any code in your report that will be submitted to Moodle.

Marking and feedback

Feedback and results will be provided to you one week (approximately) after you submit your assessment . Please note that due to a large number of students, the feedback could be delayed.

Penalty for Late Submissions --- UNSW has a standard late submission penalty of: 5% per day, for all  assessments where a penalty applies, capped at five days (120 hours) from the assessment deadline, after which a student cannot submit an assessment, and no permitted variation.

Abalone shells: https://en.wikipedia.org/wiki/Abalone

PS. Can be eaten: https://www.foodrepublic.com/1294247/what-is-abalone-and-why-do-we- love-it/

Option I - A: Foundations of Neural Networks

Dataset

"Predicting the age of abalone from physical measurements. The age of abalone is

determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements , which are easier to obtain, are

used to predict the age. Further information, such

as weather patterns and location (hence food availability) may be required to solve

the problem." Source http://archive.ics.uci.edu/ml/datasets/Abalone

You need to treat the project as a multiclass  classification problem. You show

results for the ring age classified into 4 major groups, i.e. 4 output neurons using

the following ring age groups:

Class 1: 0 - 7 years

Class 2: 8- 10 years

Class 3: 11 - 15 years

Class 4: Greater than 15 years

Build a model using the neural network model either using Keras or scikit-learn in

Python or R.

Part A

Analyse and visualise the given data sets by reporting the distribution of class,

distribution of features and any other visualisation you find appropriate.

Investigate the effect of the number of hidden neurons (eg. 5, 10, 15, 20) for a

single hidden layer  (using SGD)

Investigate the effect of learning rate (using SGD) for the selected data set (using

the optimal number of hidden neurons).

Investigate the effect on a different number of hidden layers (1, 2) with the optimal number of hidden neurons (from Part 4).

Investigate the effect of L2 Regularisation/Dropouts (either of them since dropouts is  not available on sklearn) on the best model for far and provide a discussion.

Investigate the effect of Adam and SGD on training and test performance on the best model so far and provide a discussion

Evaluate the best* model using the classification accuracy score. Take the best run

and show the confusion matrix and ROC/AUC curve for different classes of the multi-

class problem. *The best model should be based on the performance of the test dataset.

You need to do some research about ROC/AUC for multiclass problems: We have applied it  here: Khan, A. A., Chaudhari, O., & Chandra, R. (2023). A review of ensemble learning and data augmentation models for

class imbalanced problems: combination,

implementation and evaluation. arXiv preprint arXiv:2304.02858.https://arxiv.org/abs/ 2304.02858

Note that task 2 to 6 require 10 experimental runs (with different initial weights)

for each case where you report mean and 95% confidence interval (you can compute

confidence interval using standard deviation). You need to select the appropriate

metrics, i.e., for classification report performance on the train and test data set.

Use 60/40 percent train/test split for given data set (data split remains fixed across experiments). Note that there is no need to have a validation set.

Part B

Take the best model and build a regression model where you predict the age rather than classifying it into the 4 classes. Report RMSE mean and 95 % confidence interval of

10 experiments and compare SGD with Adam optimiser. Visualise the prediction vs actual of best model. You are free to decide which type of plot suit, and can use  several  plots for visualisation.

Part C

Feature additional visualisation such as error (residual)  plots from Part B on the train and test split for

best model over time and any other visualisations for the

training/test performance.

Hybrid Dropout and Weight Decay from Part B: Using Adam, compare L2 regularisation

(weight decay) with dropouts. Show results for 3 different combinations (can be more) of hyperparameters (dropout rate with weight decay hyper-parameter (λ) )

Submission types: As a MATH3856 student submitting individually, you will need only Part A, but as a MATH5836 student submitting individually, you will need to do Part A and

Part B. A MATH3856 group will need to do Part A and Part B, and MATH5835 group will do Part A, B and C.

Installation: You should install required libraries and run the experiments on your

personal computers and upload the results/code on Ed later. Note that the code will not be evaluated.

Marks will be given only for your report. You can also submit a readme.txt with your submission that gives an overview of your files/code. The reason we need your code is for plagiarism check in case if we are

suspicious about your report.

How to submit

Click on the submit button to submit your code.

In Ed, our preference is model.py or model.r for the main code which should read data and run. Upload Screenshots of Console if code does not run on Ed and you are using    local machine to run the code.

We recommend that you use Scikit-learn in case of Python  since it runs faster on Ed.

DISCUSSION

What is a good model? you need to decide that with trial runs, i.e how many iterations are needed to get good performance on the train and test dataset. You can make

convergence plots and then decide what is the best time to stop training.

Option I - B: Report

Task

Write a report to describe the steps performedt to develop the model and evaluate its performance.

Provide brief justifications, with clearly articulated reasons, for the steps you took  to build the model you submitted.

Presentation style/format

Use IEEE Conference paper template in Latex or Word: https://www.ieee.org/conferences/publishing/templates

.html https://www.overleaf.com/latex/templates/ieee-conference-

template-example/nsncsyjfmpxy

Your report should have the following sections: Problem definition (abstract and

Introduction) and methodology, results, and conclusion. To get more information on

these sections, click here. You are encouraged to cite at least 10 references in your technical report (If you do Option III, a comprehensive literature review would be

needed). Note that introduction highlights literature, aim and goals and the general

problem you are trying to solve. You are free to use different section title or report  style, although this style of reporting is encouraged.

Quantitative information should be clearly described and appropriately communicated (e .g. using figures and tables that are appropriately labelled).

There is no strict word limit.

Your report should be written using correct spelling, grammar, and punctuation.

Follow IEEE referencing style.

You need to submit code that runs in Ed. If your report takes into account N=10

experiments for example, but your code submitted needs to have N=1 in the for loop

that repeats the experiments. You should use functions/methods. You can upload results  that are used to

generate the results - plots that will be part of your report and

keep plots as a separate file code if you wish.

You can also submit a readme.txt with your submission that gives an overview of your

files/code.

GUIDELINE

You are free to follow your own style (in terms of writing not report template). This is just for information

only and it is not mandatory to follow it: https://www.

linkedin.com/pulse/how-write-scientific-paper-rohitash-chandra/?trackingId=%

2BPyBv7qHQiutfqY%2FbWNnMw%3D%3D

Writing tips:

https://users.ics.aalto.fi/ntatti/howtowrite2016/tutorial.pdf

https://www.sciencedirect.com/science/article/pii/S1878764915001606

How to submit

This assessment is submitted through Turnitin on the Assessment submission page in Moodle.

Do not include any code in your report that will be submitted to Moodle.

Option II - A: Tree and Ensemble Learning

This option will feature components from decision trees, random forests,  and ensemble

learning.

Use the multiclass  Abalone dataset from Option I, you need to apply CART for the sample  problem and report the

classification performance on the train and test set using the

same train/test split. Abalone dataset from UCI ML repository: https://archive.ics.uci.edu/ml/datasets/Abalone.

Part A

Provide visualisation and analysis of your Abalone multi-class data (similar to first

part of Assessment 2 )

Create a Decision Tree  for the Abalone multi-class data and report train and test

performance   for multiple experimental (can be 5 or more) runs using different

hyperparameters - i.e tree depth or any other hyperparameter of your choice. Take the

best Tree and report the Tree Visualisation (show your tree and also translate few

selected nodes and leaves into IF and Then rules): Note: Since Decision Trees give the

same results for the same dataset, ensure that in different experimental runs, you

create different set of train/test split as done in Week1 and Week 2 Exercise solutions.

Do an investigation about improving performance further by either pre-pruning or post-

pruning the tree: https://scikit-learn.org/stable/auto_examples/tree/plot_cost_

complexity_pruning.html

Apply Random Forests  and show performance (eg. accuracy score) as your number of trees in the ensembles increases.

Further, compare your results with XGBoost and Gradient Boosting and provide a

discussion.

Compare results with Adam/SGD (Simple Neural Networks)   and discuss them. You can use

default hyper-parameters from the sklearn library - there is no need for extensive

hyperparameter search as needed in Opton I.

Note that performance refers to accuracy which could be either classification accuracy, AUC  or F1 score.  You can report all or selected type of scores.

Part B: Apply the above steps for the Abalone regression problem. Compare results with

linear regression model from earlier Assessment.

Part C: Apply the above steps to another dataset of your choice.

Submission types: As a MATH3856 student submitting individually, you will need only Part A, but as a MATH5836 student submitting individually, you will need to do Part A and

Part B. A MATH3856 group will need to do Part A and Part B, and MATH5835 group will do

Part A, B and C.

Click on the submit button to submit your code.

In Ed, our preference is model.py or model.r for the main code which should read data

and run. Upload Screenshots of Console if code does not run on Ed and you are using

local machine to run the code.

We recommend that you use Scikit-learn in case of Python  since it runs faster on Ed.

Option II - B: Report

Task

Write a report to describe the steps   to develop the model and evaluate its

performance.

Provide brief justifications, with clearly articulated reasons, for the steps you

took to build the model you submitted.

Presentation style/format

Use IEEE Conference paper template in Latex or Word: https://www.ieee.org/

conferences/publishing/templates.html https://www.overleaf.com/latex/templates/ieee -conference-template

-example/nsncsyjfmpxy

Your report should have the following sections: Problem definition (abstract and

Introduction) and methodology, results, and conclusion. To get more information on these sections, click here. You are encouraged to cite at least 10 references in

your technical report (If you do Option III, a comprehensive literature review

would be needed). Note that introduction highlights literature, aim and goals and

the general problem you are trying to solve. You are free to use different section title or report style, although this style of reporting is encouraged.

Quantitative information should be clearly described and appropriately communicated (e.g. using figures and tables that are appropriately labelled).

There is no strict word limit.

Your report should be written using correct spelling, grammar, and punctuation.

Follow IEEE referencing style.

You need to submit code that runs in Ed. If your report takes into account N=10

experiments for example, but your code submitted needs to have N=1 in the for loop that repeats the experiments. You should use functions/methods. You can upload

results that are used to generate the results - plots that will be part of your

report and keep plots as a separate file code if you wish.

You can also submit a readme.txt with your submission that gives an overview of

your files/code.

GUIDELINE

You are free to follow your own style (in terms of writing not report template).

This is just for information only and it is not mandatory to follow it: https://www.linkedin.com/pulse/how- write-scientific-paper-rohitash-chandra/?trackingId=%

2BPyBv7qHQiutfqY%2FbWNnMw%3D%3D

Writing tips:

https://users.ics.aalto.fi/ntatti/howtowrite2016/tutorial.pdf

https://www.sciencedirect.com/science/article/pii/S1878764915001606

How to submit

This assessment is submitted through Turnitin on the Assessment submission page in Moodle. Click on the

below link to submit your report through Turnitin on the

Assessment submission page in Moodle.

Do not include any code in your report that will be submitted to Moodle.

Option III - B: Report

Task

Write a report to describe the steps performed  to develop the model and evaluate its performance

.

Provide brief justifications, with clearly articulated reasons, for the steps you took to build the model you submitted.

Presentation style/format

Use IEEE Conference paper template in Latex or Word: https://www.ieee.org/conferences/

publishing/templates.html https://www.overleaf.com/latex/templates/ieee-conference-template-

example/nsncsyjfmpxy

Your report should have the following sections: Problem definition (abstract and Introduction) and methodology, results, and conclusion. To get more information on these sections, click here. You    are encouraged to cite at least 10 references in your technical report (If you do Option III, a

comprehensive literature review would be needed). Note that introduction highlights literature,

aim and goals and the general problem you are trying to solve. You are free to use different section title or report style, although this style of reporting is encouraged.

Quantitative information should be clearly described and appropriately communicated (e.g. using figures and tables that are appropriately labelled).

There is no strict word limit.

Your report should be written using correct spelling, grammar, and punctuation.

Follow IEEE referencing style.

You need to submit code that runs in Ed. If your report takes into account N=10 experiments for

example, but your code submitted needs to have N=1 in the for loop that repeats the experiments. You should use functions/methods. You can upload results that are used to generate the results -    plots that will be part of your report and keep plots as a separate file code if you wish.

You can also submit a readme.txt with your submission that gives an overview of your files/code.

GUIDELINE

You are free to follow your own style (in terms of writing not report template). This is just for

information only and it is not mandatory to follow it: https://www.linkedin.com/pulse/how-write- scientific-paper-rohitash-chandra/?trackingId=%2BPyBv7qHQiutfqY%2FbWNnMw%3D%3D

Writing tips:

https://users.ics.aalto.fi/ntatti/howtowrite2016/tutorial.pdf

https://www.sciencedirect.com/science/article/pii/S1878764915001606

How to submit

This assessment is submitted through Turnitin on the Assessment submission page in Moodle. Click on the below link to submit your report through Turnitin on the Assessment submission page in

Moodle.

Do not include any code in your report that will be submitted to Moodle.

Evaluation - Rubrics

A. Overall presentation

The paper has an excellent presentation. The introduction clearly defines the aim and goals of the paper with a clear review of the literature. Results and discussion section has been presented very well. (25 %)

The paper has a good presentation. The introduction clearly defines the aim and goals of the paper. Results and discussion section has been presented well but some issues present. (20%)

The paper has some presentation issues. The introduction does not clearly define the aim and goals of the paper. Results and discussion section has not been presented very well. (15 %)

The paper has a poor presentation. The introduction has missing aim and goals. Results and discussion section is questionable or not complete. (10 %)

No submission/results not correct (0 %)

B. Depth of discussion and presentation of results

In-depth discussion & elaboration in all sections of the paper. (25 %)

In-depth discussion & elaboration in most sections of the paper. (20 %)

The writer has omitted pertinent content or content runs-on excessively. (15 %)

Cursory discussion in all the sections of the paper or brief discussion in only a few sections. (10 %)

No submission/results. (0 %)

C. Cohesiveness

Ties together information from all sources. Paper flows from one issue to the next clearly. Author's writing demonstrates an understanding of the relationship among material obtained from all sources. (25 %)

For the most part, ties together information from all sources. Paper flows with only some disjointedness. Author's writing demonstrates an understanding of the relationship among material obtained from all

sources. (20 %)

Sometimes ties together information from all sources. Paper does not flow - disjointedness is apparent. Author's writing does not demonstrate an understanding of the relationship among material obtained    from all sources. (15 %)

Does not tie together information. Paper does not flow and appears to be created from disparate issues. Headings are necessary to link concepts. Writing does not demonstrate understanding any relationships  (10%)

Not coherent/ no submission (0 %)

D. Sources and citations

Relevant sources cited and provides a proper overview of sources in the text. (25 %)

Relevant sources cited and provides a proper overview of sources in discussion but some minor issues present. (20 %)

Sources are credible, however, mistakes in citations. (15 %)

Does not cite and discuss the source properly. (10 %)

No citations/submission (0%)

Note that plagiarism will automatically imply 0 marks.

Adapted from: https://www.cornellcollege.edu/library/faculty/focusing-on-assignments/tools-for-

assessment/research-paper-rubric.shtml

Do not include any code in your report that will be submitted to Moodle.

Report- writing

Title of project: Try not to use abbreviations —should be simple and informative

Abstract

Gives an overview - summary of the paper. Highlight the problem and give aim and goals, contributions - and results. How do you improve the body of knowledge? What are your major contributions? What makes your

paper notable - what makes your work significant addition to the literature? Can your work be easily reproduced i.e - is the code/software - data source openly available?

Limit abbreviations. Do not cite papers in Abstract.

(recommendation: use past/present tense and active verse)

1. Introduction

First paragraph: opening statement - eg. “ Neural networks along with deep learning models have gained a lot   of popularity recently due to their success in multimedia applications [1][2] such as face recognition, biometrics , autonomous driving systems etc [3][4]." Ensure you cite review papers - old and new ones where possible wh   en you are trying to summarize a field. Ensure that you cite the foundational papers, i.e which papers contribut  ed to the field of neural networks, decision trees, ensemble methods, etc? (recommendation: use past tense a   nd active verse)  (this is literature review)

Second paragraph: highlight some challenges of neural networks or time series problems (eg. big data

problems, overfitting —bias-variance issue, hyperparameter tuning) (recommendation: use past tense and activ e verse)  (this is literature review)

Third paragraph: discuss other issues relating to your paper. Could be a relating issue bringing together multiple fields, cite your work where possible if you have already worked in the area and highlight the   contributions. (recommendation: use past tense and active verse)  (this is literature review)

Fourth paragraph: motivation paragraph —why do you want to do this project? eg. not enough work was done  that evaluates the key parameters for neural networks. Not enough work done that evaluates temporal CNNs f or climate-related problems. (recommendation: use present tense and active verse)

Fifth paragraph —contribution paragraph: In this paper or project, we investigate (expand on the title of the pr oject). Then list major goals (what would you test —hyperparameters? which ones ?) Algorithms? Which ones? What dataset would you use? Classificatifation problem or regression? (use present tense and active verse)

Overview: the rest of the paper is organised as follows. In section 2 …. In Section 3 … Finally, we conclude the pa per in Section xxx

2. Background and Related Work (Optional for A3)

This section can have subsections that are related to your methodology or application problem, where you are covering major details that can't be given in the Introduction. This includes background information such as

Adam training algorithm, basic details of MCMC, and basic details of the Genetic Algorithm. The aim of this

section is to help general readers - this is useful in multidisciplinary papers so that readers find all information in one place. Also, you can review the latest literature in this area which is not really directly related but could be nearly distant to your project/paper. This helps in placing your paper in the spot - it shows where your

neighbours are hanging out - well kind of ...

(recommendation: use past/present tense and active verse)

3. Methodology

Data - describe the original data (if application problem) and also the way to process the data. (recommendation: use present tense and active verse)

Overview of the methods (LSTM, CNN etc) (recommendation: use past tense and active verse)

Framework or workflow diagram —to highlight data input, processing, model, predictions and post-processing of results. (recommendation: use present tense and active verse)

Software suite —library (recommendation: use present tense and active verse)

Experiment setting where you give default settings such as neural network topology etc that won't be

evaluated in experiments. The goal of this section is to ensure that others can reproduce what you have done - it is best to give GitHub repo link with this section or place the link end of the paper. (recommendation: use

present tense and active verse)

4. Results

You need to layout the results —use present tense and active verse, and describe the results as if you are summ arizing to someone who cannot see them.

eg. “In this section, we present the results of our proposed methodology. Table 1 presents the effect of the trai ning time on the performance accuracy and Figure 2 shows the convergence plot. We notice a trend that the p  erformance accuracy improves as the time increases but there comes a time (200 epochs) where there is not m uch improvement. This is not the case for the conventional method (Bayes-NN) where it takes more than 500 e pochs to reach such a point.”

Follow these as examples of how results are described and discussed:

Chandra R; Saini R, 2021, ‘Biden vs Trump: Modeling US General Elections Using BERT Language Model’, IEEE Access, vol. 9, pp. 128494–128505, http://dx.doi.org/10.1109/ACCESS.2021.3111035

Chandra R; Goyal S; Gupta R, 2021, ‘Evaluation of Deep Learning Models for Multi-Step Ahead Time Series Pre diction’, IEEE Access, vol. 9, pp. 83105–83123, http://dx.doi.org/10.1109/ACCESS.2021.3085085

Chandra R; Krishna A, 2021, ‘COVID-19 sentiment analysis via deep learning during the rise of novel cases’, PL oS ONE, vol. 16, pp. e0255615, http://dx.doi.org/10.1371/journal.pone.0255615

R. Chandra and V. Kulkarni, “Semantic and Sentiment Analysis of Selected Bhagavad Gita Translations Using BE RT-Based Language Framework,” in IEEE Access, vol. 10, pp. 21291–21315, 2022, https://ieeexplore.ieee.org/d ocument/9715095

Chandra R; He Y, 2021, ‘Bayesian neural networks for stock price forecasting before and during COVID-19 pand emic’, PLoS ONE, vol. 16, pp. e0253217, http://dx.doi.org/10.1371/journal.pone.0253217

Please note, that the above is only for guiding you —different fields have different requirements and styles, and you need to follow the requirements. (recommendation: use present tense and active verse)

5. Discussion

summarise your results —have you reached your goals?

implications of the results —can relate to the literature?

limitations of the methodology —results

(recommendation: use past tense and active verse)

6. Conclusions

what are the major contributions?

directions for future research

(recommendation: use past tense and active verse)

References

[1] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal

approximators. Neural networks, 2(5), pp. 359–366. Retrieved from https://www.sciencedirect.com/science/ar ticle/abs/pii/0893608089900208

[2] Trenn, S. (2008). Multilayer perceptrons: Approximation order and a necessary number of hidden units.

IEEE transactions on neural networks, 19(5), pp.836–844. Retrieved from https://ieeexplore.ieee.org/documen t/4469950

APPENDIX

https://users.ics.aalto.fi/ntatti/howtowrite2016/tutorial.pdf

https://www.sciencedirect.com/science/article/pii/S1878764915001606