Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Machine Learning COIY065H7

Reassessment 2021-2022

Coursework description, guidelines and marking scheme

1. Introduction

This assignment is an integral part of this module and contributes 20% to the overall mark.

Imagine that you are a member of a research group in a company. Your group leader has asked you to create an intelligent system based on neural networks to model wine quality based on physiochemical properties; you should implement it; test it out; explore its behaviour and explain it; optimise it and perhaps even suggest ways it might be improved.

By doing this coursework you will get experience with implementing, running, adapting, and evaluating machine learning methods on real data. You will need to write, reuse or change code, run it on some data, make some figures, read a few background papers, present your results, and write a report describing the problem you tackled, the machine learning algorithm you used and the results you obtained .

For your experimental study you must use the following real-world Wine quality data: this dataset concerns red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult:https://archive.ics.uci.edu/ml/datasets/Wine+Quality or the reference :

P. Cortez, A . Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, Elsevier, 47(4):547-553, 2009   (https://doi.org/10.1016/j.dss.2009.05.016).  The   paper  can   be  downloaded  from ScienceDirect via Birkbeck’s e-library . A pre-print

You should use the above data set. You cannot choose any data you wish.

You can use any programming language or software library for this assignment. In the labs, we use MATLAB which provides well tested functions for neural networks design that are appropriate for the UCI dataset mentioned above. It is not required to program everything from scratch but if you reuse code or libraries you have to cite the original source, otherwise you might be accused for plagiarism.

The assignment is further explained in Section 2. Section 3 of this document gives you an example of how to structure your report and explains the marking scheme . Section 4 presents the  deadlines  and   submission   instructions. Section 5 explains  the   penalties  for   late submissions, and Section 6 explains how the College deals with  plagiarism . Section 7 and Section 8provide additional information on learning resources and referencing .

2. Implementation and experimentation

You  can  do  your  own  implementation  in  MATLAB,  write  your  own  code  or  build  on  a package/library from the internet. You are not tested on programming so the coding style does not have to be perfect, and your code does not have to be optimal but it should obviously work correctly. I wouldn’t recommend implementing all the methods required for training neural networks, e.g . backpropagation, derivatives calculation etc., from scratch unless you are very

experienced with Java, C++, Python or some other programming language or platform . No matter what you do/use, make sure that all sources and code taken by others, or the internet, are cited properly in your Report; otherwise, you may be accused ofplagiarism.

Some packages provide techniques for determining the optimal structures of machine learning models (e.g. the model architecture or the model hyperparameters) automatically as part of the training. In that case, instead of performing experimental tests varying the number of free parameters of the model, these techniques can be used to find the appropriate structure for your model. Still some of these methods may have their own parameters, which require fine tuning.

Note  that  the  performance  results  sometimes  may  be  more  meaningful,  if  a  validation technique is used, such as k-fold cross validation (k=7 or k=10 is typically used), or leave- one-out  cross  validation,  or  some  form  of  Monte  Carlo  simulation .  Lastly,  the  use  of regularisation, provided in some software packages and in Matlab, normally helps to get better results.

The results of your experiments should be stored in ASCII format, in a Jupyter notebook, or in notebook documents produced by other web-based interactive computational environments, specifying whether the result is from the training or the testing phase, and should be submitted together with your report (Moodle will allow to submit additional files; up to 5 files in total can be submitted). Check that these files can be opened and read correctly. Results should be presented using figures and tables and discussed in your Report, i.e. it is not enough just to submit a python notebook or files with results- these are not accepted as a Report submission.

3. Assignment outline and marking scheme

Your work will be presented in a Report (notebook documents are not accepted as a Report). It is important that your Report is properly structured. Sections like the ones shown below could be included in your report to ensure good coverage of the topic. Approximately 2000-2500 words  are  expected  to  cover  in  sufficient  depth  all  aspects  of  the  assignment,  but our marking is not based on the number of words used in the Report . What  primarily matters is that you describe your design of your system/model, justify any choices you made, explain how things work, and make the model work with data. Also, you will need to provide insight on how to (pre)process the data before feeding into the model (if necessary) and do the  training,  how  to  debug  the  learned  model  (not  only  the  training  algorithm),  how  to measure  model  performance  and  demonstrate  its  significance . You are not just being marked on how good the results of your training algorithm or neural model are. However, when awarding marks, the work described in the report is marked in comparison to the work of other students. The following marking scheme is used.

1. Methodology, design and technical contribution (40% of the mark): the appropriate use and sophistication of design methods, overall methodology and implementation is marked here

1.1 This part should normally describe clearly the approach used in your design and implementation and any relevant parameters (e.g. for neural networks models this includes  number  of  hidden  nodes,  layers,  type  of  activation  functions,  training algorithm parameters etc). If you are using a particular library or tool, you still need to describe how the methods/functions that you are using operate and what parameter values are used . Citing the library, tool, etc., just listing library functions or just stating that default settings are used” is not enough to get a good mark .

1.2 This part should describe and justify any special techniques/methods or parameters used in the stages of your methodology, e.g . in data preprocessing, in initialisation or during training. Also, this part should describe and justify any techniques for normalisation or missing data, or other pre-processing or balancing methods, and whether you  have used some form of cross-validation,  regularisation, or weight decay, providing details of the particular method. As mentioned above, citing thelibrary, tool, etc., or simply mentioning the library functions is not enough to get a high mark.

1.3 This part should describe your technical contribution : if your code fully runs or not and how you have tested the code (especially important if your code does not run- it  makes sense to show that you  have  made a serious attempt to  identify any issues); if the code is based on existing code and whether you have made any effort to combine different components; whether a novel implementation has been produced and/or you have produced your own modifications. If the implementation has  problems/bugs,  it  does  not  automatically  mean  that  you  have  failed  the coursework.

2. Experiments, findings and discussion (50% of the mark): the experimental design and the investigation’s systematicity are  marked here. You  must present and discuss your experimental design and results. You are expected to run several experiments, explore the behaviour of the model during training, under different conditions, and calculate basic statistics to summarise performance in training and testing . You can explore the behaviour of various neural  network  training  algorithms  if  you  want  or  compare  with  other  machine  learning methods that you  have  implemented  or  reused . That  work will  enhance the  experimental study. You also need to discuss the significance of your findings or comparisons that you have made. Your report must include at least two figures which graphically illustrate quantitative aspects of your results, such as training/testing errors, performance metrics for sets of learned parameters, algorithm outputs, descriptive statistics, etc.

For example, in this part you can use Excel or other packages to provide charts - like the figure below, which uses error bars (Box and Whisker Charts in Excel), to show the performance of your trained model in terms of generalisation . For example, the figure below shows generalisation with respect to number of hidden nodes used in a neural network-based  solution.  Alternatively,  one  could  use  tables  to  provide  the  same information by giving for each number of hidden nodes the average value, the minimum value,  and  the  maximum  value  of  generalisation  performance  (in  percentage  of successfully recognised patterns) in the tests.

97

96

95

94

93

92

91

90

89

88


5        1      15      20     25      3

Number of hidden nodes

You could also discuss the cost of the computations, e.g . referring to the number of training  iterations  required  or the  number  of error function  evaluations  (see  figure below for the neural-network based solution discussed above). In the same way, you can  also  present  and  discuss  other  aspects  of  the  training  phase,  e.g.  impact  of hyperparameters, folds, balancing methods etc.

550

500

450

400

350

300

250

200

150

100

50

5      10     15     20     25     30

Number of hidden nodes

In  machine  learning,  experimental  comparison/evaluation  is  presented  for  a  few methods and their parameters. Overall results are also presented in tables like the one below  that  shows  average  performance  of two  models,  trained  using  two  different training algorithms, on the same test set . In this case, performance is shown in terms of recognition success  per class as well as average classification success. Confusion matrices can also be used.

Method

Class 1 (%)

Class 2 (%)

Average success (%)

Algorithm 1- trained Model Algorithm 2- trained Model

83

73

96

93

93

88

3. Conclusions (10% of the mark)

3.1 Provide an honest and justified overview/summary of your work and a critical view of the findings.

3.2  Identify  areas  for  improvement;  discuss  what  you  could  have  done  better (particularly important if you failed some of your targets or your results are not as expected) .

4. Bibliography: this is not marked directly but is necessary as it supports the justification of your methodology and methods and the claims/arguments in your report . When it comes to references, I prefer the Harvard referencing system (seeSection8) but no matter what system you use try to be consistent and make sure that all sources are cited in the text of the report, and they are also listed in the bibliography section- this way you don’t get in any trouble with

plagiarism detection software.

Provide  a  list  of the  bibliographical/web  sources you  used  (see Section7).  Include publication details and all information necessary to access the online resources. Sources should be cited in the text by (Author name, year) and appear in the references list in alphabetical order by Author’s last name. This also applies to websites, e.g. an online article/webpage should be listed in your references; for example :

(MLOSS,   2011). Machine learning open source repository. Available online at http://mloss.org/

NOTE: use of any text or code (even open-source code) taken from other sources should be clearly identified and referenced in your report to avoid plagiarism (see Section6). If you are unsure on which  parts  of your code  needs appropriate  referencing  do consult the  module lecturer.

4. Deadlines and submission instructions

Submission is only through Moodle and consists of the submission of a Report, and data files with your results and code as appendices (if existing Matlab toolboxes have been used or libraries, these should be mentioned in the Report). Important parts of your code can be included and discussed in the report but a Notebook document, copy of the Matlab editor etc. is not accepted as a Report submission. The preferred format for the

Report is Word document but also PDF or RTF are accepted with data and code embedded in the file. You report and the code will be tested for plagiarism.

We are unfortunately unable to accept and mark reports that contain text embedded as an image into the document. These documents cannot go through the plagiarism detection system and will not be marked. Also, ZIP files are not accepted as they cannot go through the plagiarism detection system, and they will not be marked.

Make sure you are familiar with Moodle and able to upload your files (for example you could test the system by uploading a test file). Hardcopy versions of the report/data/code files will not be accepted . We do not accept submissions by email.

You  should  upload  on  Moodle  the  completed  assignment  by August 9th, 2022 at 1:00pm (this is Moodle’s clock/server time not your PC’s time. In case you are planning to upload your files whilst at a remote location make sure you check Moodle’s time and take into account time zone differences).

I would encourage you to include data and code as appendices in the Report and upload only one file, but sometimes this may not be possible. In any case, your files should be named according to your last name.  For example,  MAGOULAS_Report .doc; MAGOULAS_Results.xls etc.

Your Report must have a cover page!

The cover page MUST have the following information:

Module title and code: Machine Learning- COIY065H7

Name: your first name and last name

Student ID: provide your ID

Emails: provide your College email AND the email you use- if different from your College email

Your Report should have an Appendix with a description of data files and code submitted; when these are not included in the Report, as appendices, but are submitted separately. Any specific instructions on software use should also be included in an Appendix of the report, entitled Instructions for using the code” .

It  is your  responsibility to ensure that  all files and documents transferred from your own machines are the latest versions and in the correct format, and that any programs execute as intended on Department’s systems prior to the submission date.

Each piece of submitted work MUST also have a page entitled Academic Declaration” by the author that certifies that the author has read and understood the sections of plagiarism in the document http://www.bbk.ac.uk/mybirkbeck/services/rules/Assessment%20Offences.pdf that describes College’s Policy on assessment offences . Confirm that the work is your own, with the work of others fully acknowledged. Submissions must also be accompanied by a declaration

giving us permission to submit your report to the plagiarism testing database that the College is using.

Reports without a Declaration form are not considered as completed assignments and are not marked.

The Academic Declaration should read as follows: I have read and understood the sections of plagiarism in the College Policy on assessment offences and confirm that the work is my own, with the work of others clearly acknowledged. I give my permission to submit my report to the plagiarism testing database that the College is using and test it using plagiarism detection software, search engines or meta- searching software.

You  should  note that  all original  material  is  retained  by the  Department for  reference  by internal and external examiners when moderating and standardising the overall marks after


the end of the module. Only original marks awarded will be shown on Moodle, i .e. any penalty or capping will be applied at a later stage after examining any mitigating circumstances claims.

5. Late coursework

It is our policy to accept and mark late submissions of coursework. You do not need to negotiate new deadlines and there is no need to obtain prior consent of the module lecturer.

The last day the system will accept a late submission for this module is August 19th, 2022 at 1:00pm (this is Moodle time not your PC’s time. In case you are planning to upload your files whilst at a remote location make sure you check the Blackboard time and take into account time zone differences) . All coursework submitted by the cut-off deadline will be marked.