Assignment 3: CNNs & The `Dataset` API

In this assignment, you will create a better model for the MNIST dataset using convolutional neural networks. You will also get to know Tensorflow’s (current) main way of feeding data to the training process, which will be useful for more complex datasets.

CNN for MNIST

You should have seen that modifying layer sizes, changing activation functions etc. is simple: You can generally change parts of the model without affecting the rest of the program. In fact, you can change the full pipeline from input to model output without having to change anything else.

Replace your MLP by a CNN. You can check this tutorial for an example. You can ignore dropout for now. Note: Depending on your machine, training a CNN may take much longer than the MLPs we’ve seen so far. Also, processing the full test set in one go for evaluation might be too much for your RAM. In that case, you could break up the test set into smaller chunks and average the results.

If you haven’t done so, now might be a good time to check out the tf.layers API. It offers a decent middle ground between low-level control and convenience (defining tf.Variables by hand gets old quickly).
Similarly, you should consider using AdamOptimizer instead of the basic GradientDescentOptimizer. This will usually lead to much faster learning without the need for manual tuning of the learning rate or other parameters. We will discuss advanced optimization strategies later in the class, but the basic idea behind Adam is that it automatically chooses/adapts a per-parameter learning rate as well as incorporating momentum. Using Adam, your CNN should beat your MLP after only a few hundred steps of training.

Having set up your basic CNN, you should include some visualization. In particular, one thing that is often used to diagnose CNN performance is visualizing the filters, i.e. the weights of the convolutional layers. The only filters that are straightforward to interpret are the ones in the first layer, since they operate directly on the input. The filter matrix should have a shape filter_width x filter_height x 1 x n_filters. Visualize the n_filters many images using tf.summary.image. This way, you can even see how the filters develop over training. Comment on what these filters seem to be recognizing (this can be difficult with small filter sizes such as 5 x 5). Experiment with different filter sizes as well (maybe up to 28 x 28?). See if there are any redundant filters (i.e. multiple filters recognizing the same patterns) and whether you can achieve a similar performance using fewer filters. In principle such redundancy checking can be done for higher layers as well, but note that there each filter has as many channels as there are filters in the layer below (you would need to visualize these separately). Note: Accessing the filters when using the layers API is not trivial because they are created “under the hood”. Check out tf.get_collection for a way to get them (and feel free to share any other ways you can find ;)).

Datasets

It should go without saying that loading numpy arrays and taking slices of these as batches isn’t a great way of providing data to the training algorithm. For example, what if we are working with a dataset that doesn’t fit into memory? The currently recommended way of handling datasets is via the tf.contrib.data module. Now is a good time to take some first steps with this module. Read the Programmer’s Guide section on this. You can ignore the parts on high-level APIs as well as anything regarding TFRecords and tf.Example (we will get to these later). Then try to achieve the following tasks:

Create a training dataset (containing both images and labels) straight from the numpy arrays produced via conversions.py (see Assignment 1).
Create a dataset straight from the picture folder produced by conversions.py with the -p flag and these text files containing labels. You can assume that the labels appear in the same order as the pictures (assuming you didn’t rename them). Iterate over the dataset and plot some of the incoming data to verify that it works. If you want to, train a model with this new way of inputting data instead.

Note that the Tensorflow guide often uses the three operations shuffle, batch and repeat. Think about how the results differ when you change the order of these operations (there are six orderings in total). You can experiment with a simple Dataset.range dataset. Note how no order implements the following scenario:

Achieve a clear distinction of training into epochs, where each training example is seen exactly once per epoch. Each epoch should present the data in a newly-shuffled fashion.

Can you find a way to implement this scenario? Hint: You will want to look at iterator types other than the simple oneshot iterator.

Assignment 3: CNNs & The Dataset API

CNN for MNIST

Datasets

Assignment 3: CNNs & The `Dataset` API