COIY065H7 Machine Learning 2021-2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Machine Learning COIY065H7
Reassessment 2021-2022
Coursework description, guidelines and marking scheme
1. Introduction
This assignment is an integral part of this module and contributes 20% to the overall mark.
Imagine that you are a member of a research group in a company. Your group leader has asked you to create an intelligent system based on neural networks to model wine quality based on physiochemical properties; you should implement it; test it out; explore its behaviour and explain it; optimise it and perhaps even suggest ways it might be improved.
By doing this coursework you will get experience with implementing, running, adapting, and evaluating machine learning methods on real data. You will need to write, reuse or change code, run it on some data, make some figures, read a few background papers, present your results, and write a report describing the problem you tackled, the machine learning algorithm you used and the results you obtained .
For your experimental study you must use the following real-world Wine quality data: this dataset concerns red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult:https://archive.ics.uci.edu/ml/datasets/Wine+Quality or the reference :
P. Cortez, A . Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, Elsevier, 47(4):547-553, 2009 (https://doi.org/10.1016/j.dss.2009.05.016). The paper can be downloaded from ScienceDirect via Birkbeck’s e-library . A pre-print
You should use the above data set. You cannot choose any data you wish.
You can use any programming language or software library for this assignment. In the labs, we use MATLAB which provides well tested functions for neural networks design that are appropriate for the UCI dataset mentioned above. It is not required to program everything from scratch but if you reuse code or libraries you have to cite the original source, otherwise you might be accused for plagiarism.
The assignment is further explained in Section 2. Section 3 of this document gives you an example of how to structure your report and explains the marking scheme . Section 4 presents the deadlines and submission instructions. Section 5 explains the penalties for late submissions, and Section 6 explains how the College deals with plagiarism . Section 7 and Section 8provide additional information on learning resources and referencing .
2. Implementation and experimentation
You can do your own implementation in MATLAB, write your own code or build on a package/library from the internet. You are not tested on programming so the coding style does not have to be perfect, and your code does not have to be optimal but it should obviously work correctly. I wouldn’t recommend implementing all the methods required for training neural networks, e.g . backpropagation, derivatives calculation etc., from scratch unless you are very
experienced with Java, C++, Python or some other programming language or platform . No matter what you do/use, make sure that all sources and code taken by others, or the internet, are cited properly in your Report; otherwise, you may be accused ofplagiarism.
Some packages provide techniques for determining the optimal structures of machine learning models (e.g. the model architecture or the model hyperparameters) automatically as part of the training. In that case, instead of performing experimental tests varying the number of free parameters of the model, these techniques can be used to find the appropriate structure for your model. Still some of these methods may have their own parameters, which require fine tuning.
Note that the performance results sometimes may be more meaningful, if a validation technique is used, such as k-fold cross validation (k=7 or k=10 is typically used), or leave- one-out cross validation, or some form of Monte Carlo simulation . Lastly, the use of regularisation, provided in some software packages and in Matlab, normally helps to get better results.
The results of your experiments should be stored in ASCII format, in a Jupyter notebook, or in notebook documents produced by other web-based interactive computational environments, specifying whether the result is from the training or the testing phase, and should be submitted together with your report (Moodle will allow to submit additional files; up to 5 files in total can be submitted). Check that these files can be opened and read correctly. Results should be presented using figures and tables and discussed in your Report, i.e. it is not enough just to submit a python notebook or files with results- these are not accepted as a Report submission.
3. Assignment outline and marking scheme
Your work will be presented in a Report (notebook documents are not accepted as a Report). It is important that your Report is properly structured. Sections like the ones shown below could be included in your report to ensure good coverage of the topic. Approximately 2000-2500 words are expected to cover in sufficient depth all aspects of the assignment, but our marking is not based on the number of words used in the Report . What primarily matters is that you describe your design of your system/model, justify any choices you made, explain how things work, and make the model work with data. Also, you will need to provide insight on how to (pre)process the data before feeding into the model (if necessary) and do the training, how to debug the learned model (not only the training algorithm), how to measure model performance and demonstrate its significance . You are not just being marked on how good the results of your training algorithm or neural model are. However, when awarding marks, the work described in the report is marked in comparison to the work of other students. The following marking scheme is used.
1. Methodology, design and technical contribution (40% of the mark): the appropriate use and sophistication of design methods, overall methodology and implementation is marked here
1.1 This part should normally describe clearly the approach used in your design and implementation and any relevant parameters (e.g. for neural networks models this includes number of hidden nodes, layers, type of activation functions, training algorithm parameters etc). If you are using a particular library or tool, you still need to describe how the methods/functions that you are using operate and what parameter values are used . Citing the library, tool, etc., just listing library functions or just stating that “default settings are used” is not enough to get a good mark .
1.2 This part should describe and justify any special techniques/methods or parameters used in the stages of your methodology, e.g . in data preprocessing, in initialisation or during training. Also, this part should describe and justify any techniques for normalisation or missing data, or other pre-processing or balancing methods, and whether you have used some form of cross-validation, regularisation, or weight decay, providing details of the particular method. As mentioned above, citing thelibrary, tool, etc., or simply mentioning the library functions is not enough to get a high mark.
1.3 This part should describe your technical contribution : if your code fully runs or not and how you have tested the code (especially important if your code does not run- it makes sense to show that you have made a serious attempt to identify any issues); if the code is based on existing code and whether you have made any effort to combine different components; whether a novel implementation has been produced and/or you have produced your own modifications. If the implementation has problems/bugs, it does not automatically mean that you have failed the coursework.
2. Experiments, findings and discussion (50% of the mark): the experimental design and the investigation’s systematicity are marked here. You must present and discuss your experimental design and results. You are expected to run several experiments, explore the behaviour of the model during training, under different conditions, and calculate basic statistics to summarise performance in training and testing . You can explore the behaviour of various neural network training algorithms if you want or compare with other machine learning methods that you have implemented or reused . That work will enhance the experimental study. You also need to discuss the significance of your findings or comparisons that you have made. Your report must include at least two figures which graphically illustrate quantitative aspects of your results, such as training/testing errors, performance metrics for sets of learned parameters, algorithm outputs, descriptive statistics, etc.
For example, in this part you can use Excel or other packages to provide charts - like the figure below, which uses error bars (Box and Whisker Charts in Excel), to show the performance of your trained model in terms of generalisation . For example, the figure below shows generalisation with respect to number of hidden nodes used in a neural network-based solution. Alternatively, one could use tables to provide the same information by giving for each number of hidden nodes the average value, the minimum value, and the maximum value of generalisation performance (in percentage of successfully recognised patterns) in the tests.
97
96
95
94
93
92
91
90
89
88
|
|
|||||
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 1 15 20 25 3
Number of hidden nodes
You could also discuss the cost of the computations, e.g . referring to the number of training iterations required or the number of error function evaluations (see figure below for the neural-network based solution discussed above). In the same way, you can also present and discuss other aspects of the training phase, e.g. impact of hyperparameters, folds, balancing methods etc.
550
500
450
400
350
300
250
200
150
100
50
5 10 15 20 25 30
Number of hidden nodes
In machine learning, experimental comparison/evaluation is presented for a few methods and their parameters. Overall results are also presented in tables like the one below that shows average performance of two models, trained using two different training algorithms, on the same test set . In this case, performance is shown in terms of recognition success per class as well as average classification success. Confusion matrices can also be used.
Method |
Class 1 (%) |
Class 2 (%) |
Average success (%) |
Algorithm 1- trained Model Algorithm 2- trained Model |
83 73 |
96 93 |
93 88 |
3. Conclusions (10% of the mark)
3.1 Provide an honest and justified overview/summary of your work and a critical view of the findings.
3.2 Identify areas for improvement; discuss what you could have done better (particularly important if you failed some of your targets or your results are not as expected) .
4. Bibliography: this is not marked directly but is necessary as it supports the justification of your methodology and methods and the claims/arguments in your report . When it comes to references, I prefer the Harvard referencing system (seeSection8) but no matter what system you use try to be consistent and make sure that all sources are cited in the text of the report, and they are also listed in the bibliography section- this way you don’t get in any trouble with
plagiarism detection software.
Provide a list of the bibliographical/web sources you used (see Section7). Include publication details and all information necessary to access the online resources. Sources should be cited in the text by (Author name, year) and appear in the references list in alphabetical order by Author’s last name. This also applies to websites, e.g. an online article/webpage should be listed in your references; for example :
(MLOSS, 2011). Machine learning open source repository. Available online at http://mloss.org/
NOTE: use of any text or code (even open-source code) taken from other sources should be clearly identified and referenced in your report to avoid plagiarism (see Section6). If you are unsure on which parts of your code needs appropriate referencing do consult the module lecturer.
4. Deadlines and submission instructions
Submission is only through Moodle and consists of the submission of a Report, and data files with your results and code as appendices (if existing Matlab toolboxes have been used or libraries, these should be mentioned in the Report). Important parts of your code can be included and discussed in the report but a Notebook document, copy of the Matlab editor etc. is not accepted as a Report submission. The preferred format for the
Report is Word document but also PDF or RTF are accepted with data and code embedded in the file. You report and the code will be tested for plagiarism.
We are unfortunately unable to accept and mark reports that contain text embedded as an image into the document. These documents cannot go through the plagiarism detection system and will not be marked. Also, ZIP files are not accepted as they cannot go through the plagiarism detection system, and they will not be marked.
Make sure you are familiar with Moodle and able to upload your files (for example you could test the system by uploading a test file). Hardcopy versions of the report/data/code files will not be accepted . We do not accept submissions by email.
You should upload on Moodle the completed assignment by August 9th, 2022 at 1:00pm (this is Moodle’s clock/server time not your PC’s time. In case you are planning to upload your files whilst at a remote location make sure you check Moodle’s time and take into account time zone differences).
I would encourage you to include data and code as appendices in the Report and upload only one file, but sometimes this may not be possible. In any case, your files should be named according to your last name. For example, MAGOULAS_Report .doc; MAGOULAS_Results.xls etc.
Your Report must have a cover page!
The cover page MUST have the following information:
Module title and code: Machine Learning- COIY065H7
Name: your first name and last name
Student ID: provide your ID
Emails: provide your College email AND the email you use- if different from your College email
Your Report should have an Appendix with a description of data files and code submitted; when these are not included in the Report, as appendices, but are submitted separately. Any specific instructions on software use should also be included in an Appendix of the report, entitled “Instructions for using the code” .
It is your responsibility to ensure that all files and documents transferred from your own machines are the latest versions and in the correct format, and that any programs execute as intended on Department’s systems prior to the submission date.
Each piece of submitted work MUST also have a page entitled “Academic Declaration” by the author that certifies that the author has read and understood the sections of plagiarism in the document http://www.bbk.ac.uk/mybirkbeck/services/rules/Assessment%20Offences.pdf that describes College’s Policy on assessment offences . Confirm that the work is your own, with the work of others fully acknowledged. Submissions must also be accompanied by a declaration
giving us permission to submit your report to the plagiarism testing database that the College is using.
Reports without a Declaration form are not considered as completed assignments and are not marked.
The Academic Declaration should read as follows: “I have read and understood the sections of plagiarism in the College Policy on assessment offences and confirm that the work is my own, with the work of others clearly acknowledged. I give my permission to submit my report to the plagiarism testing database that the College is using and test it using plagiarism detection software, search engines or meta- searching software.”
You should note that all original material is retained by the Department for reference by internal and external examiners when moderating and standardising the overall marks after
the end of the module. Only original marks awarded will be shown on Moodle, i .e. any penalty or capping will be applied at a later stage after examining any mitigating circumstances claims.
It is our policy to accept and mark late submissions of coursework. You do not need to negotiate new deadlines and there is no need to obtain prior consent of the module lecturer.
The last day the system will accept a late submission for this module is August 19th, 2022 at 1:00pm (this is Moodle time not your PC’s time. In case you are planning to upload your files whilst at a remote location make sure you check the Blackboard time and take into account time zone differences) . All coursework submitted by the cut-off deadline will be marked.
2022-07-25