WS 2017/18

Introduction to Deep Learning

 

Assignment 3: CNNs & The Dataset API

In this assignment, you will create a better model for the MNIST dataset using convolutional neural networks. You will also get to know Tensorflow’s (current) main way of feeding data to the training process, which will be useful for more complex datasets.

CNN for MNIST

You should have seen that modifying layer sizes, changing activation functions etc. is simple: You can generally change parts of the model without affecting the rest of the program. In fact, you can change the full pipeline from input to model output without having to change anything else.

Replace your MLP by a CNN. You can check this tutorial for an example. You can ignore dropout for now. Note: Depending on your machine, training a CNN may take much longer than the MLPs we’ve seen so far. Also, processing the full test set in one go for evaluation might be too much for your RAM. In that case, you could break up the test set into smaller chunks and average the results.

Having set up your basic CNN, you should include some visualization. In particular, one thing that is often used to diagnose CNN performance is visualizing the filters, i.e. the weights of the convolutional layers. The only filters that are straightforward to interpret are the ones in the first layer, since they operate directly on the input. The filter matrix should have a shape filter_width x filter_height x 1 x n_filters. Visualize the n_filters many images using tf.summary.image. This way, you can even see how the filters develop over training. Comment on what these filters seem to be recognizing (this can be difficult with small filter sizes such as 5 x 5). Experiment with different filter sizes as well (maybe up to 28 x 28?). See if there are any redundant filters (i.e. multiple filters recognizing the same patterns) and whether you can achieve a similar performance using fewer filters. In principle such redundancy checking can be done for higher layers as well, but note that there each filter has as many channels as there are filters in the layer below (you would need to visualize these separately). Note: Accessing the filters when using the layers API is not trivial because they are created “under the hood”. Check out tf.get_collection for a way to get them (and feel free to share any other ways you can find ;)).

Datasets

It should go without saying that loading numpy arrays and taking slices of these as batches isn’t a great way of providing data to the training algorithm. For example, what if we are working with a dataset that doesn’t fit into memory? The currently recommended way of handling datasets is via the tf.contrib.data module. Now is a good time to take some first steps with this module. Read the Programmer’s Guide section on this. You can ignore the parts on high-level APIs as well as anything regarding TFRecords and tf.Example (we will get to these later). Then try to achieve the following tasks:

Note that the Tensorflow guide often uses the three operations shuffle, batch and repeat. Think about how the results differ when you change the order of these operations (there are six orderings in total). You can experiment with a simple Dataset.range dataset. Note how no order implements the following scenario:

Can you find a way to implement this scenario? Hint: You will want to look at iterator types other than the simple oneshot iterator.