RNN Addition (1st Grade)

Ever since I ran across RNNs, they have intrigued me with their ability to learn. The best background is Denny Britz’s tutorial, Karpathy’s totally accessible and fun post on character-level language models, and Colah’s detailed descriptions of LSTMs. Besides all the fun examples of generating content with RNNs, other people have been applying them and winning Kaggle competitions and the ECML/PKDD challenge.

I am still blown away by how RNN’s can learn to add. RNNs are trained through thousands of examples and can learn how to sum numbers. For example, the Keras addition example show how to add two sets of numbers up to 5 digital long each (e.g., 54678 + 78967). It achieves 99% train/test accuracy in 30 epochs with a one layer LSTM (128 HN) and 550k training examples.

My eventual goal is to use RNNs to study various sequenced data (such as the NBA SportVu), so I thought I should start simple. I wanted to teach a RNN to add a series of numbers. For example: 5+7+9. The rest of the post discusses this journey.

1st Grade Model

My first model was teaching an RNN to add between 5 to 15 single digit numbers. This would be at the level of a first grader in the US. For example, using a 2 layer LSTM network with 100 hidden units, a batch of 50 training examples, and 5000 epochs, the RNN summed up:

8+6+4+4+0+9+1+1+7+3+9+2+8 as 66.2154007

This isn’t too far from the actual answer of 62. The Keras addition example show that with even more examples/training, the RNN can get much better. The code for this RNN is available as a gist using tensorflow. I made this in a notebook format so its easy to play with.

There are lots of parameters to tweak with RNN models, such as the number of hidden units, epochs, batch size, dropout, and training rate. Each of these has different sorts of effects on the model. For example, increasing the number of hidden units will provide more space for learning, but consequently take longer to learn/train. The chart below shows the effect of different choices. Please take the time to really study/investigate the role of hidden units. Its a dynamic plot so you can zoom in and examine each series individually by clicking on the legend.

Cost by epoch