Introduction to Deep Learning
tf.data
Deadline: Sunday October 28, 6PM
You can send me your assignment by one of those methods:
Visualizing the learning progress as well as the behavior of a deep model is extremely useful (if not necessary) for troubleshooting in case of unexpected outcomes (or just bad results). In this assignment, you will get to know TensorBoard, Tensorflow’s built-in visualization suite, and use it to diagnose some common problems with training deep models. NOTE: TensorBoard seems to work best with Chrome-based browsers. Other browsers may take a very long time to load, or not display the data correctly.
As before, you will need to do some extra reading to learn how to use TensorBoard. There are several tutorials on the Tensorflow website. Start with the basic one. Instead of following the (somewhat bloated) example, you could just integrate some summary operations into your MLP from last assignment. The basic steps are:
summary
nodes for anything you’re interested in.FileWriter
for some log directory.Histogram summaries (useful to get a quick impression of things such as
activation values) are treated in more detail
here.
Tensorboard also visualizes the
computation graph
of your model. Draw a graph for your model yourself, and compare it to the
Tensorflow graph. Play around with different choices of variable_scope
or
name_scope
in your model and see how this changes the graph display. Also
note how you can get live statistics on memory usage, run time etc. Finally,
check out the github readme for
more information on how to use the Tensorboard app itself.
Note: The above are “suggestions” – no need to hand them in.
Finally, running Tensorboard on Colab requires some extra work. Feel free to develop on your machine instead. See this stackoverflow question for how to run Tensorboard on Colab.
Download this archive containing a few Python scripts. All these scripts contain simple MLP training scripts for MNIST. All of them should also fail at training. For each example, find out through visualization why this is. Also, try to propose fixes for these issues. Note that all these scripts are set to write summaries every training step. You normally wouldn’t want to do this, however it can be useful for debugging.
Note 1: If you used the MNIST dataset that comes with Tensorflow up to now, please use the “hand-made” MNIST dataset this time (see last assignment). This is because some of these examples are supposed to show off rather volatile behavior that might not appear with the Tensorflow version of this data (due to differences in preprocessing).
Note 2: For the same reason (volatility), please don’t mess with the parameters of the network or learning algorithm before experiencing the original. You can of course use any oddities you notice as clues as to what might be going wrong.
Note 3: The examples use the tf.layers
interface. The definitions should
be fairly intuitive. Check
the API to learn more.
There is also a section on this interface in
the low-level introduction.
Sometimes it can be useful to have a look at the inputs your model actually
receives. tf.summary.image
helps here. Note that you need to reshape the
inputs from vectors to 28*28-sized images and add an extra axis for the color
channel (despite there being only one channel). Check out tf.reshape
and
tf.expand_dims
.
Otherwise, it should be helpful to visualize histograms/distributions of layer
activations and see if anything jumps out. Note that histogram summaries will
crash your program in case of nan
values appearing. In this case, see if you
can do without the histograms and use other means to find out what is going
wrong.
You should also look at the gradients of the network; if these are “unusual”
(i.e. extremely small or large), something is probably wrong. Accessing
gradients is a bit annoying as you need to split up the minimize()
step:
train_step = SomeOptimizer().minimize(cost)
becomes
optimizer = SomeOptimizer()
grads_and_vars = optimizier.compute_gradients(cost) # list of (gradient, variable) tuples
train_step = optimizer.apply_gradients(grads_and_vars)
However, you can now get a list of the per-variable gradients via
[g for g, v in grads_and_vars]
. An overall impression of a gradient’s size
can be gained via tf.norm(g)
; feel free to add scalar summaries of these
values to TensorBoard. To get a sensible name for these summaries, mabye make
use of v.name
of the corresponding variable. If you only want to pick out
gradients for the weight matrices (biases are usually less interesting), try
picking only those variables that have kernel
in their name.
Like last week, play around with the parameters of your networks. Use Tensorboard to get more information about how some of your choices affect behavior. For example, you could compare the long-term behavior of saturating functions such as tanh with relu, how the gradients vary for different architectures etc.
It should go without saying that loading numpy arrays and taking slices of these as batches isn’t a great way of providing data to the training algorithm. For example, what if we are working with a dataset that doesn’t fit into memory? Also, the method you’ve been using so far to provide data to your models (placeholder and feed dicts) is inefficient as noted in the Performance Guide.
The currently recommended way of handling datasets is via the tf.data
module.
Now is a good time to take some first steps with this module. Read
the Programmer’s Guide section
on this. You can ignore the parts on high-level APIs as well as anything
regarding TFRecords and tf.Example
(we will get to these later) as well as
any iterator types except the simple one_shot_iterator
. Then try to achieve
the following tasks:
conversions.py
(see Assignment 1).conversions.py
with the -p
flag and
these text files containing labels.
You can assume that the labels appear in the same order as the pictures
(assuming you didn’t rename them).
Iterate over the dataset and plot some of the incoming data to verify that it
works. If you want to, train a model with this new way of inputting data instead.Please be aware that the MNIST data we provide you in the last assignment is stored as ints in the range of [0, 255]. You might want to convert to floats and divide all values by 255 to have the data in the range of [0, 1] – input values being too large can lead to numerical issues.
Note that the Tensorflow guide often uses the three operations shuffle
,
batch
and repeat
. Think about how the results differ when you change the
order of these operations (there are six orderings in total). You can
experiment with a simple Dataset.range
dataset. What do you think is the most
sensible order?