IM931 Interdisciplinary Approaches to Machine Learning

Spring 2021

Laboratory Assignment and Report

Due Wednesday 17th February 2021, 12pm (Week 6)


Length: 15 CATS: 2000 words, 20/30 CATS: 3000 words

Word count does not include tables, figures, or references/bibliography.

Using the provided training/test sets for the MNIST dataset of bitmapped hand-drawn digits on Moodle, you will explore the space of hyperparameters for three different model architectures: (1) single-layer perceptron, (2) multi-layer perceptron (MLP), and (3) convolutional neural network (CNN). Your goal will be to explore the “space” of hyperparameters in an attempt to find models which provide the best accuracy on a held-out test set.


For R users:

You should use the dataset entitled mnist2, which is encoded in the file mnist2.Rdata from the IM931_Midterm_R.zip file on Moodle. To load the dataset, use the following command:

> load('mnist2.Rdata')

This will load in a new variable mnist2 into your session. It has vectors of 50,000 training examples (mnist2$train$x and mnist2$train$y) and 10,000 validation samples (mnist2$test$x and mnist2$test$y).


For Python users:

You should use the dataset saved as mnist2.json in the IM931_Midterm_Python.zip file on Moodle. To load the dataset, you can use the following commands:

import json

with open("mnist2.json") as f:

mnist2 = json.load(f)

This will load in a new variable mnist2 into your session. It has vectors of 50,000 training examples (mnist2["train"]["x"]and mnist2["train"]["y"]) and 10,000 validation samples (mnist2["test"]["x"] and mnist2["test"]["y"]).


Your main assignment is to detail the results of six (6) distinct experiments across these three major architectures (at least 1 per architecture), in which you will systematically vary one or more of the hyperparameters discussed in lecture:

• Epochs

• Batch size

• Learning rate

• Number and type of layer

• Number of units in each layer

• Choice of nonlinearity (e.g. tanh vs. ReLU)• Weight initialization

• Regularization/Weight decay

• Use of dropout / dropout parameters

Please provide simple figures and tables, as needed, to understand how predictive accuracy and/or training time improve or worsen as you systematically modify these hyperparameters. (Depending on the choice of variables, you might consider using a linear model to interpret the data collected for one or more experiments, but this is not strictly necessary). More importantly, please clearly describe the model and your experimental manipulations of said model in English, and (when relevant) cite papers or other texts appropriately (see below for example citations). The choice of experiments is up to you — you should try several different strategies and perhaps only report on the results that are interesting to you or which help you build a more predictive model.

Each set of experiments on a given architecture should be associated with a model that you construct based on your observations (e.g., if your experiments show that 64 units for an MLP with 1 hidden layer is best, you should instantiate and fit that model). You will submit, along with your report, a supplemental .R or .py file which defines 3 models, one per architecture, which you think would perform best on unseen, held-out test data.


For R users:

We have included three supporting files to help you get started (also on Moodle). IM931_Midterm.R includes code for a single model; IM931_Midterm-Tuning_Runs.R includes code to systematically modify certain hyperparameters for that model and automatically and conveniently produce tables for you (in the form of R data frames), which you can then describe in English and provide simple plots when relevant. The third file, IM931_Midterm-ShowPredictionsExample.R, provides example code of visualizing misclassified digits, as well as code for visualizing the top 25 most misclassified digits.


For Python users:

We have provided a single IPython notebook with examples, IM931_Midterm.ipynb. It provides code for an example multilayer perceptron whose efficacy is explored (or ‘tuned’) over (1) the number of epochs; (2) the learning rate and # of units in the first layer. Similar to the R files, along with this code to systematically modify certain hyperparameters for that model, we also provide code to conveniently produce tables for you (in the form of data frames provided by the pandas library), which you can then describe in English and provide simple plots when relevant. There is also a bit of code to export that table to Excel, which requires the openpyxl Python library; you can install this via Anaconda Navigator, or with the following commands in a terminal:

conda activate hoai

pip install openpyxl


Please make sure to provide a clear introduction and conclusion paragraph, using your own words, describing your goals and your conclusions from your experiments.

Submit your final report as username_report1.docx, and your models as username_models1.R (where username is your Warwick ID).

Examples of papers which you may want to cite:

Rectified Linear Units / ReLU: Nair & Hinton (2010)

Dropout: Hinton, Srivastava, Krizhevsky, Sutskever, & Salakhutdinov (2012)

Xavier Initialization: Glorot & Bengio (2010)

Multilayer Perceptron: Rumelhart, Hinton, & Williams (1986)


References

      Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 249–256). Retrieved from http://proceedings.mlr.press/v9/glorot10a.html

      Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. Retrieved from https://arxiv.org/abs/1207.0580v1

      Nair, V., & Hinton, G. E. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (pp. 807–814). USA: Omnipress. Retrieved from http://dl.acm.org/citation.cfm?id=3104322.3104425

      Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. https://doi.org/10.1038/323533a0