Summer 2018

Introduction to Deep Learning

 

Assignment 3: CNNs

In this assignment, you will create a better model for the MNIST dataset using convolutional neural networks.

CNN for MNIST

You should have seen that modifying layer sizes, changing activation functions etc. is simple: You can generally change parts of the model without affecting the rest of the program. In fact, you can change the full pipeline from input to model output without having to change anything else (restrictions apply).

Replace your MNIST MLP by a CNN. You can check this tutorial for an example. Note: Depending on your machine, training a CNN may take much longer than the MLPs we’ve seen so far. Also, processing the full test set in one go for evaluation might be too much for your RAM. In that case, you could break up the test set into smaller chunks and average the results. You could also remove dropout and make the dense layer at the end smaller.

If your CNN is set up well, you should reach extremely high accuracy results. This is arguably where MNIST stops being interesting. If you haven’t done so, consider working with Fashion-MNIST instead (see Assignment 1). This should present more of a challenge and make improvements due to hyperparameter tuning more obvious/meaningful.

Probing the Network

Having set up your basic CNN, you should include some visualization. In particular, one thing that is often used to diagnose CNN performance is visualizing the filters, i.e. the weights of the convolutional layers. The only filters that are straightforward to interpret are the ones in the first layer, since they operate directly on the input. The filter matrix should have a shape filter_width x filter_height x 1 x n_filters. Visualize the n_filters many images. You can do this via tf.summary.image (this allows you to see the filters develop over training). Alternatively, you can use libraries such as matplotlib as this offers many more plotting options (better colormaps in particular).

Comment on what these filters seem to be recognizing (this can be difficult with small filter sizes such as 5 x 5). Experiment with different filter sizes as well (maybe up to 28 x 28?). See if there are any redundant filters (i.e. multiple filters recognizing the same patterns) and whether you can achieve a similar performance using fewer filters. In principle such redundancy checking can be done for higher layers as well, but note that there each filter has as many channels as there are filters in the layer below (you would need to visualize these separately).

Note: Accessing the filters when using the layers API is a bit annoying because they are created “under the hood”. This is particularly true if you use something like y = tf.layers.conv2d(x, ...). Instead, you could use conv_layer = tf.layers.Conv2D(...); y = conv_layer.apply(x). This gives the same result, but allows you to access the layer parameters via conv_layer.trainable_weights. See here for some examples of using tf.layers (and also tf.data). You can ignore anything mentioning “feature columns”.