Eat Melon! A Deep Q Reinforcement Learning Demo

Description

This demo is a fun way to gain familiarity with the Deep Q Learning algorithm. It is based off Karpathy's ConvNetJS Deep Q Learning Demo with some tweaks on the UI side. You can fiddle with reward and model parameters, see how that affects the agent, which should give you a feel for how Deep Q learning works. The Deep Q learning algorithms are best known for learning to play Atari games. You can check out the original paper or train models on the Atari games using OpenAI's tools.

More detailed guidelines are below (or watch a demo video). Please remember to reload the network after making changes to the reward or model parameters. The agent takes about fifteen minutes to train with the current settings. If you're impatient, scroll down and load an example pre-trained network from pre-filled JSON.

State Visualizations

Average reward over time (this should go up as agent becomes better on average at collecting rewards)

Controls

Watermelon 5 Poop -6
Avoid Wall Reward 1 Forward Reward 1

X Food Speed 0 Y Food Speed 0

Epsilon parameter 5 Gamma parameter 70

Using the model

This demo has a 2D agent that has 9 eyes pointing in different angles ahead and every eye senses 3 values along its direction (up to a certain maximum visibility distance): distance to a wall, distance to watermelon, or distance to poop. These are input states (27 total). The agent has 5 possible actions, go straight, turn a little and move in that direction, or turn more and ahead small. To make life a bit more difficult for the agent, you can use the speed controls to make the food move in X and Y directions.

Karpathy designed three ways to reward the agent. It can earn a reward for learning to go forward as well as for avoiding the walls. The third way is the most fun. You can reward it for eating watermelon and give it a negative reward for eating poop! (Hacking on this with a 10 year old nearby led to the food choices here, although her first choice was a unicorn eating rainbow candy.) You can find sliders for controlling all three of these behaviors.

The Deep Q learning algorithm has a number of settings that can also be tweaked. The epsilon setting tells the agent what percent of the time to try some random action. The model current starts at 1 and slowly drops down until it reaches the epsilon setting. The gamma setting determines how much the agent values immediate rewards. A high gamma means more attention is given to far away goals. In the models playing Atari, gamma was typically set close to 1. The numbers shown by the slider are divided by 100 before they are used in the algorithm.

To ensure your changes to the model apply, you should click the reload button. This will start the model over. Below the reload button is a textfield that gets eval()'d to produce the Q-learner for this demo. In this space you can adjust other parameters and settings. As an example, I have put in a variable for the number of food items created.

To save your model, use the save network to JSON button. Additionally, this allows you to start using a pretrained network. This can be useful if you have trained the agent for a particular behavior.

The code for this demo is available on github. Additionally, I have made a behind the scenes video where I walk through the code and summarize changes I made (and explain why the graphics look crappy). There are lots more tweaks and improvements that can be done. Do it! Its a great way to start getting a feel for reinforcement learning.

Some fun things to try:

Just teach the agent to move around -- set the food rewards to zero.
Get the agent to just circle around -- set the food rewards and forward reward to zero.
Teach the agent to eat poop -- reverse the food awards
Change the amount of food, starve versus feast and see how that affects behavior -- use the items_total in the text field
Add moving food and see how the agent reacts . . . compare this its reward score to non-moving food
Trying changing the epsilon and gamma and see how that affects behavior

I/O

You can save and load a network from JSON here. Note that the textfield is prefilled with a pretrained network that works reasonably well. Just hit the load button!

About

You can find out more about me at my homepage, blog, rajistics youtube channel, linkedin, or twitter.