Assignment 2: Visualization

Visualizing the learning progress as well as the behavior of a deep model is extremely useful (if not necessary) for troubleshooting in case of unexpected outcomes (or just bad results). In this assignment, you will get to know TensorBoard, tensorflow’s built-in visualization suite, and use it to diagnose some common problems with training deep models. NOTE: TensorBoard seems to work best with Chrome-based browsers. Other browsers may take a very long time to load, or not display the data correctly.

First Steps with TensorBoard

As before, you will need to do some extra reading to learn how to use TensorBoard. There are several tutorials on the Tensorflow website. Start with the basic one. Instead of following the (somewhat bloated) example, you could just integrate some summary operations into your MLP from last assignment. The basic steps are:

Add summary nodes for anything you’re interested in.
(optional) Merge all the summaries.
Set up a FileWriter for some log directory.
During training, run the summary op(s) and write them to disk.

Histogram summaries (useful to get a quick impression of things such as activation values) are treated in more detail here. Tensorboard also visualizes the computation graph of your model. Draw a graph for your model yourself, and compare it to the Tensorflow graph. Play around with different choices of variable_scope or name_scope in your model and see how this changes the graph display. Also note how you can get live statistics on memory usage, run time etc. Finally, check out the github readme for more information on how to use the actual app.

Diagnosing Problems via Visualization

Download this archive containing a few Python scripts. All these scripts contain simple MLP training scripts for MNIST. All of them should also fail at training. For each example, find out through visualization why this is. Also, try to propose fixes for these issues.

Note 1: We ask you to use the “hand-made” MNIST dataset this time (see last assignment). This is because some of these examples are supposed to show off rather volatile behavior that might not appear with the Tensorflow version of this data (due to differences in preprocessing). You might run into an issue where the file names are slightly different from those in the script, but this should be solveable. If there are any big issues, please contact Jens and he will try to fix them as soon as possible.

Note 2: For the same reason (volatility), please don’t mess with the parameters of the network or learning algorithm before experiencing the original. You can of course use these things as “clues” as to what might be going wrong.

Note 3: The examples use the tf.layers interface. The definitions should be fairly intuitive. Check the API to learn more.

For the fourth example, tf.summary.image should be useful. Note that you need to reshape the inputs from vectors to 28*28-sized images and add an extra axis for the color channel (despite there being only one channel). Check out tf.reshape and tf.expand_dims.

Otherwise, it should be helpful to visualize histograms/distributions of layer activations and see if anything jumps out. You should also look at the gradients of the network; if these are “unusual” (i.e. extremely small or large), something is probably wrong. Accessing gradients is a bit annoying as you need to split up the minimize() step:

train_step = SomeOptimizer().minimize(cost)

becomes

optimizer = SomeOptimizer()
grads_and_vars = optimizier.compute_gradients(cost)  # list of (gradient, variable) tuples
train_step = optimizer.apply_gradients(grads_and_vars)

However, you can now get a list of the per-variable gradients via [g for g, v in grads_and_vars]. An overall impression of a gradient’s size can be gained via tf.norm(g); feel free to add scalar summaries of these values to TensorBoard. To get a sensible name for these summaries, mabye make use of v.name of the corresponding variable. If you only want to pick out gradients for the weight matrices (biases are usually less interesting), try picking only those variables that have kernel in their name.

Bonus

Like last week, play around with the parameters of your networks. Use Tensorboard to get more information about how some of your choices affect behavior. For example, you could compare the long-term behavior of saturating functions such as tanh with relu.

For this and all future assignments: Always feel free to try out whatever you want. If you find yourself wondering “I wonder what would happen if I did x”, and you have the time, do x! Report anything you find cool in the “Assignments” channel on Mattermost.