Introduction to Deep Learning
In this assignment, you will reimplement the classic LeNet architecture, one of the first CNNs. Additionally, you will get to know a TensorBoard tool that helps with visualizing high-dimensional spaces such as the hidden activations of neural networks.
Implement LeNet based on this visual description. Note that “subsampling” is just average pooling (with stride 2), and of course you will need 10 output units instead of 26.
In past assignments, we have seen that it is fairly simple to visualize the weights of a linear model and get an intuitive idea of their meaning. This is not so simple for deep models involving hidden layers. Convolutional layers can be visualized thanks to their known spatial structure, but no such structure exists for fully-connected layers. Luckily, TensorBoard comes with a tool to visualize such hidden spaces, which can give an impression of how well they represent the data and separate the different classes.
This Tensorflow tutorial gives a short overview over what “embeddings” are (in our case, we can understand the layers of a network to compute embeddings of the input) as well as an overview over how to use TensorBoard’s built-in Embedding Projector. Unfortunately, this tutorial seems to have been cut in the recent update for unknown reasons and a lot of vital information is missing. A more complete version can be found here. The tutorial talks about tf.train.SummaryWriter at some point; replace this by tf.summary.FileWriter.
Try some visualizations for an MNIST model (MLP or CNN). You might want to proceed in the following steps:
tf.train.Saver. Ignore anything regarding metadata for now. For example, you might want to visualize the activations of some hidden layer on (a part of) the test set. There’s a bit of a dilemma here: You will want to visualize activations of a trained network, but it seems like the Saver will only store variables defined before itself. Also, it is bad practice to add variables/ops to your graph after a session has launched. The following is a little annoying, but works (if you come up with a better way, feel free to share it!):
    tf.Variable with the appropriate shape to hold your embeddings (e.g. 1000x100 if you want to visualize a 100-dimensional hidden layer on 1000 elements of the test set). Fill it with whatever values; this will get overwritten.tf.assign op where you assign the hidden activation variable to the embedding variable.Note: If using relative paths for the metadata/sprite image, this path should be relative to the saver path. I.e. if the saver saves to the logdir, the metadata paths should be relative to the logdir. The Tensorflow tutorial gives a different impression here.
Now that we have some pretty images, you should consider what they actually mean. The n-dimensional data we supplied has automatically been reduced to two or three dimensions. You can choose between two methods for this: t-SNE or PCA. The Tensorflow tutorial has several links explaining these methods. Definitely read this one on t-SNE. Here is another one on PCA. Play around with the different options such as 2D/3D, t-SNE hyperparameters, or sphereizing the data or not (also note the effect on % explained variance for the PCA!).
Finally, use these visualization to gain some insights. For example, you could visualize the different layers of a network, from input to output, and observe how the data is progressively “disentangled”. You can also visualize convolutional layers by flattening the feature maps (i.e. reshaping them to 2D). It might be interesting to see how pooling changes the embeddings. Also, compare the pre-output layers of two models with differing performances – which one seems to have the “better” representation? Note that for linear models the pre-output layer is actually the raw input.
In general, do these visualizations make sense to you? Look out for some peculiarities of the MNIST data. Are some digits clustered more tightly than others? Look for outliers (with regard to a specific digit cluster) – do these generally look atypical? Which digits are “neighbors” in this space? You might want to concentrate on (non-sphereized) PCA for these qestions.