Department of Computer Science
Illinois Institute of Technology
Vijay K. Gurbani, Ph.D.
Spring 2020: Homework 4 (10 points)
Due date: Friday May 01, 2020, 11:59:59 PM Chicago Time
Please read all of the parts of the homework carefully before attempting any question. If you detect any ambiguities in the instructions, please let me know right away instead of waiting until after the homework has been graded.
This is an INTERMEDIATE version of the homework.
1. Exercises (X Points divided evenly each question) Please submit a PDF file containing answers to these questions. Any other file format will lead to a loss of 0.5 point. Non-PDF files that cannot be opened by the TAs will lead to a loss of all points.
1.1 Tan Ch. 4, questions 14 and 15.

1.2 To be assigned.

2 Practicum problems (X points divided as specified by each assignment) Please label your answers clearly, see Homework 0 R notebook for an example (Homework 0 R notebook is available in “Blackboard→Assignment and Projects → Homework 0”.) Each answer must be preceded by the R markdown as shown in the Homework 0 R notebook (### Part 2.1-A-ii, for example). Failure to clearly label the answers in the submitted R notebook will lead to a loss of 2 points per problem below.

2.1 Principal component analysis (Points TBD)
In this assignment, you will perform PCA on a small dataset and interpret its results.
The dataset is derived from the 2007 CIA World Factbook and contains 8 attributes. These are:
Variable Meaning
GDP GDP per capita in US dollars
HIV HIV prevalence as a percentage of the population
Lifeexp Life expectancy (in years)
Mil Military spending as percentage of GDP
Oilcons Oil consumption in barrels per annum per capita
Pop Population (in millions)
Tel Number of fixed telephone lines per 1,000 people
Unempl Percentage of population unemployed
The dataset is available in Blackboard in the file countries.csv. Read in the dataset in a data frame taking care to allocate the first column to the name of the rows instead of an attribute. (Hint: See row.names parameter for the read.csv(…) method.)
(a) (i) Print a summary of all of the attributes in the dataset to become familiar with their values and ranges.
(ii) Plot a boxplot of all of the attributes. There are two outliers associated with the Pop boxplot. What do you think they represent?
(b) Perform a PCA transformation on this dataset.
(c) (i) Print the summary of the PCA object. How many components explain at least 90% of the variance?
(ii) Draw a screeplot of the PCA object.
(iii) Based on the screeplot, how many components would you use for modeling if you were to engage in a feature reduction task?
(d) Print the PCA components (the “rotation” field of the PCA object). Let’s focus on PC1 and PC2.
(i) Which attributes is PC1 positively correlated with, and which attributes is it negatively correlated with?
Based on this, what is your expectation of PC1?
(ii) Which attributes is PC2 positively correlated with, and which attributes is it negatively correlated with?
Based on this, what is your expectation of PC2?
(e) Draw a biplot with the first and second components. Then, answer the following questions:
(i) Examine the rotated variables (the “x” field of the PCA object) for the first and second component for Brazil,
UK, and Japan. Print these two columns out.
(ii) Using the information in (d)(i)(ii), provide reasons whether the values for PC1 and PC2 for Brazil, UK, and Japan make sense.
2.2 Feed Forward Neural Networks (Points TBD)
The dataset available in the file activity-small.csv [1] in Blackboard contains data in three dimensions from a single chest-mounted accelerometer. The dataset is intended for activity recognition research purposes. It provides challenges for identification of people using motion patterns. The predictor variables are in three dimensions: x-acceleration, y-acceleration, and z-acceleration. The class label is one of {0,1,2,3}, where 0 implies “Working on the computer,”, 1 implies “Standing up, walking, and going up/down the stairs,”, 2 implies “Walking,” and 3 implies “Talking while standing.”

The dataset contains 1,000 observations, of which 80% should be used for training a neural network, and 20% for testing. The intent of the trained model will be to predict the class label.

To get you started, please use the file Problem-2-2-Template.Rmd as it contains starter code to scale the dataset and to set the seed. Add your code to the template file as indicated in the file.

(a) Create a shallow neural network and train it on the training dataset so it predicts one of the four classes: {0, 1, 2, 3}. You can play around with using different activation functions (sigmoid, relu, tanh) in the first hidden layer. For the output layer, use the softmax activation function. Loss is measured using ‘categorical_crossentropy’ on the ‘adam’ optimizer.

Train the network for 100 epochs, using a batch size of 1 (batch gradient descent).

Fit your trained model to the held-out testing dataset and create a confusion matrix.
(i) What is the overall accuracy of your model on the test dataset?
(ii) What is the per-class sensitivity, specificity, and balanced accuracy on the test dataset?

(b) Train the network for 100 epochs using the following batch sizes: 1, 32, 64, 128, 256, while keeping all other parameters the same as in (a). (You are now doing mini-batch gradient descent.)

Each time you train the network with the batch size, time how long it takes to train the network. Do to so, a

quick and dirty way is as follows:
> begin <- Sys.time()
> model %>% fit(…)
> end <- Sys.time()

> # Predict on held-out testing dataset, and get the conf. matrix (see c below)

The time it took to train the network is the difference in time between ‘begin’ and ‘end’ timestamps. (Note that this is user time, not system time, but it is close enough for our understanding.)

Why does the time vary as you increase the batch size? Your .Rmd file should contain a table that looks like the following:
Batch size Time (second)
1
32
64
128
256
(c) While you are doing (b) above, gather predictions using the held-out testing dataset from each of the trained network as you vary the batch size. For each run, enumerate the:
(i) Overall accuracy of your model on the test dataset.
(ii) Per-class balanced accuracy on the test dataset.
Your .Rmd file should contain a table that looks like the following:
Batch size Overall accuracy Balanced accuracy per class
1
32
64
128
256

Comment on the aspects above (does overall accuracy, balanced accuracy and per-class statistics remain the same? Change? If change, why?

)(d) Starting with the network in (a), add one more hidden layer to your network and re-train your model and see if adding a second hidden layer produces better performance. Note that deciding how many layers to add and how many neurons per layer is more of an art than an exact science. There are some heuristics, but a lot of experience comes from playing with the neural network and observing its performance as layers (and neurons) are added. I will let you experiment with the number of neurons and specific activation function in the new hidden layer.

Take a principled approach and document how you constructed your new hidden layer, and for each such
construction, enumerate the:
(i) Overall accuracy of your model on the test dataset.
(ii) Per-class sensitivity, specificity, and balanced accuracy on the test dataset. Pick the construction that had the best performance results and compare that to the performance you observed in
(a). Comment on the changes you observed by adding a new hidden layer. (Does the performance increase?
Decrease? Stay the same?)
[1] The entire activity recognition dataset is available at
https://archive.ics.uci.edu/ml/datasets/Activity+Recognition+from+Single+Chest-Mounted+Accelerometer, the
dataset in activity-small.csv has been curated to reduce it in size and reduce the number of class labels.
2.3 Recommendation Systems (Points TBD)
To be assigned.