Winter 2018

Introduction to Deep Learning

 

Assignment 7: More Realistic Language Modeling & Recurrent Neural Networks

no submission necessary

In this task, we will once again deal with the issue of language modeling. This time, however we will be using Tensorflow’s RNN functionalities, which makes defining models significantly easier. On the flipside, we will be dealing with issues that come with variable-length inputs. This in turn makes defining models significantly more complicated. You are also asked to try wrapping everything into the high-level Estimator interface, which will require a few workarounds. We will stick with character-level models for now; while word models are more common in practice, they come with additional problems.

Preparing the Data

Once again we provide you with a script to help you process raw text into a format Tensorflow can understand. Download the script here. This script differs from the previous one in a few regards:

This also means that you need to provide the data to Tensorflow in a slightly different way during training: At the end of the day, Tensorflow works on tensors, and these have fixed sizes in each dimension. That is, sequences of different lengths can’t be put in the same tensor (and thus not in the same batch). The standard work-around for this issue is padding: Sequences are filled up with “dummy values” to get them all to the same length (namely, that of the longest sequence of the batch). The most straightforward approach is to simply add these dummy values at the end, and the most common value to use for this is 0. Doing padding is simple in Tensorflow: Use padded_batch instead of batch in tf.data.

Finally, a note on the “input function” for tf.Estimator: You may have read that this should return a two-tuple (features, labels) that will be passed to the model function. However, in this case we don’t really need labels since our targets are just the features shifted by one step. You can simply have your input function return None as the labels. However, you can of course explicitly pass the shifted inputs as labels instead if you prefer this.

Building an RNN

Defining an RNN is much simpler when using the full Tensorflow library. Again, there are essentially no official tutorials on this, so here are the basic steps:

The very least you should do is to re-implement the task from last assignment with these functionalities. That is, you may work with fixed, known sequence lengths as a start. However, the real task lies ahead and you may skip the re-implementation and go straight for that one if you wish.

Dealing with Variable-length Sequences

You may have noticed that there is one problem remaining: We padded shorter sequences to get valid tensors, but the RNN functions as well as the cost computations have no way of actually knowing that we did this. This means we most likely get nonsensical outputs (and thus costs) for all those sequence elements that correspond to padding. Let’s fix these issues.

While this was a lot of explanation, your program should hopefully be more succinct than the previous one, and it’s more flexible as well! Look at the computation graph of this network to see how compactly the RNN is represented. Experiment with different cell types or even multiple layers and see the effects on the cost. Be prepared for significantly longer training times than with feedforward networks like CNNs.

Sampling in tf.Estimator

Unfortunately, by using tf.Estimator we lose the low-level control to do step-by-step feeding of samples of a network’s output as its next input. To do sampling, you could just do a low-level implementation again. In this case, it works a lot like before: We can input single characters by simply treating them as length-1 sequences. The process should be stopped not after a certain amount of steps, but when the special character </S> is sampled (you could also continue sampling and see if your network breaks down…). Once again, supplying the initial state as a placeholder should help – note that if you use a MultiRNNCell, this needs to be a tuple of states.

But can we do sampling using tf.Estimator as well? As it turns out, we can slightly abuse the sequence-to-sequence framework in tf.contrib.seq2seq for this task. Consider this part optional as it introduces many additional concepts. However, it can be instructive to learn about how to work around the restrictions of high-level frameworks without having to sacrifice all of their benefits. There is a seq2seq tutorial on the TF website. This deals with machine translation using encoder-decoder architectures. In our case, we basically only have a decoder that generates from a fixed initial state (usually a zero state). To adapt your RNN to allow for random sampling, you need to take the following steps (most of the mentioned classes/functions live in tf.contrib.seq2seq):

Applying a Language Model

Finally, keep in mind that language modeling is actually about assessing the “quality” of a piece of language, usually formalized via probabilities. The probability of a given sequence is simply the product of the probability outputs for the character actually appearing next at each time step. Try this out: Compute the probabilities for some sequences typical for the training corpus (you could just take them straight from there). Compare these to the probabilities for some not-so-typical sequences. Note that only sequences of the same length can be compared since longer sequences automatically receive a lower probability. For example, in the King James corpus you could simply replace the word “LORD” by “LOOD” somewhere and see how this affects the probability of the overall sequence.

In the Estimator interface, getting “your own” sequences into the network can be a bit annoying since you need to work via input functions. You have (at least) two options:

  1. Prepare a text file that can be processed by the provided data preparation script.
  2. Try writing your own input function that e.g. takes input from the console. This is probably easiest using tf.data.Dataset.from_generator – this allows you to run arbitrary Python code within a generator and yield inputs to the model function as you deem appropriate.

Then you will want to run your model in predict mode and get the probabilities that way.