WS 2017/18

Introduction to Deep Learning

 

Assignment 6: More Realistic Language Modeling & Recurrent Neural Networks

In this task, we will once again deal with the issue of language modeling. This time, however we will be using Tensorflow’s RNN functionalities, which makes defining models significantly easier. On the flipside, we will be dealing with issues that come with variable-length inputs. We will stick with character-level models for now; while word models are more common in practice, they come with their own problems that we will deal with at a later time.

Preparing the Data

Once again we provide you with a script to help you process raw text into a format Tensorflow can understand. Download the script here. This script differs from the previous one in a few regards:

This also means that you need to provide the data to Tensorflow in a slightly different way during training:

Building an RNN

Defining an RNN is much simpler when using the full Tensorflow library. Again, there are essentially no official tutorials on this, so here are the basic steps:

You may have noticed that there is one problem remaining: We padded shorter sequences to get valid tensors, but the RNN functions as well as the cost computations have no way of actually knowing that we did this. This means we most likely get nonsensical outputs (and thus costs) for all those sequence elements that correspond to padding. Let’s fix these issues.

While this was a lot of explanation, your final program should be far more succinct than the previous one, and it’s more flexible as well! Look at the computation graph of this network to see how compactly the RNN is represented. Experiment with different cell types or even multiple layers and see the effects on the cost. Also evaluate the quality of samples from the network.

Sampling works a lot like before: We can input single characters by simply treating them as length-1 sequences. The process should be stopped not after a certain amount of steps, but when the special character </S> is sampled (you could also continue sampling and see how your network breaks down…). Once again, supplying the initial state as a placeholder should help – note that if you use a MultiRNNCell, this needs to be a tuple of states.

Finally, keep in mind that language modeling is actually about assessing the “quality” of a piece of language, usually formalized via probabilities. The probability of a given sequence is simply the product of the probability outputs (for the character actually appearing next) at each time step (i.e. apply softmax to the logits). Try this out: Compute the probabilities for some sequences typical for the training corpus (you could just take them straight from there). Compare these to the probabilities for some not-so-typical sequences. Note that only sequences of the same length can be compared since longer sequences automatically receive a lower probability. For example, in the King James corpus you could simply replace the word “LORD” by “LOOD” somewhere and see how this affects the probability of the overall sequence.