Learning Vision-based Robotic Manipulation Tasks Sequentially in Offline Reinforcement Learning Settings

With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online-RL does not suit itself readily into this paradigm due to costly and time-taking agent environment interaction. Therefore recently, many offline-RL algorithms have been proposed to learn robotic tasks. But mainly, all such methods focus on a single task or multi-task learning, which requires retraining every time we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep-RL would allow us to scale the number of tasks by keep adding them one-after-another. In this paper, we investigate the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyze the effect of task ordering. We also investigated the effect of the number of object configurations and density of robot trajectories. We found that learning tasks sequentially helps in the propagation of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation based approaches for continuous learning like the synaptic intelligence method although helps in mitigating catastrophic forgetting but has shown only limited transfer of knowledge from previous tasks.

翻译：随着深度强化学习方法的兴起，许多复杂的机器人操作任务得以解决。然而，充分利用深度学习的能力需要大规模数据集。在线强化学习因其昂贵的代理-环境交互成本及时耗，难以直接适应这一范式。因此，近年来提出许多离线强化学习算法来学习机器人任务。但主要而言，此类方法均专注于单一任务或多任务学习，这要求每当学习新任务时需重新训练。将不遗忘先前知识的持续学习能力与离线深度强化学习相结合，可通过逐一添加任务来扩展任务数量。本文研究了突触智能等基于正则化的方法在离线强化学习设置中顺序学习基于图像的机器人操作任务的有效性。我们评估了该组合框架对抗顺序学习常见挑战（灾难性遗忘与正向知识迁移）的性能。通过不同任务组合的实验，分析了任务顺序的影响，并探究了物体配置数量与机器人轨迹密度的影响。研究发现，顺序学习任务有助于先前知识的传播，从而减少学习新任务所需的时间。尽管突触智能等基于正则化的持续学习方法有助于缓解灾难性遗忘，但仅展现出有限的先前知识迁移能力。