Iterative Model-Based Reinforcement Learning Using Simulations in the Differentiable Neural Computer

My paper on model-based Reinforcement Learning (RL) using the Differentiable Neural Computer (DNC) was accepted at the Workshop on Multi-Task and Lifelong Reinforcement Learning at the 2019 International Conference on Machine Learning (ICML).

For this work I investigated the use of the DNC in the lifelong learning context. Lifelong learning differs from multi-task learning slightly. In multi-task learning, the goal is to learn multiple tasks simultaneously. In lifelong learning, the goal is to learn multiple tasks sequentially, without forgetting old tasks.

In the paper I introduced the Neural Computer Agent, where a model of the environment is learned by a DNC, and a paired agent is trained to maximize rewards using the Proximal Policy Optimization (PPO) algorithm in simulations generated by the DNC model.

Schematic for the iterative lifelong RL architecture, the Neural Computer Agent. An agent interacts with the environment to collect experience, which is used to train a predictive DNC model. The model is then used to simulate the environment to train the agent. The agent then rolls out to the environment again to collect new experience and iterate again.

I hypothesized that the DNC can be used to learn a global model as opposed to task-specific local models which are used in multi-task and lifelong learning. I first tested the DNC on an integer addition task where the task progressively changed in difficulty, and found that the DNC can leverage past knowledge and adapt to new tasks quickly, outperforming LSTM by an order of magnitude. Additionally, the DNC continued to perform well on the prior learned integer addition tasks after it learned new ones.

DNC and LSTM performance over the course of training on multiple progressively difficult addition tasks. The curriculum steps switch at every 5, 000 sequences trained, until 30, 000 sequences, where the most difficult task remains fixed until training concludes.

I tested The Neural Computer Agent on two toy RL environments that contained multiple tasks. I found that in both environments, the DNC was able to learn an adequate model of all tasks, and the Neural Computer Agent solved each environment iteratively entirely in simulations.

The levels (tasks) present in the Obstacle-Based Grid Navigation environment. The levels are designed to progressively increase in difficulty by adding obstacles. The agent starts at the top left cell of the grid, and has to reach the goal cell at the bottom right in a minimum number of steps.

Link to full paper: https://arxiv.org/abs/1906.07248

Leave a Reply

Your email address will not be published. Required fields are marked *