We consider a Continual Reinforcement Learning setup, where a learning agent must continuously adapt to new tasks while retaining previously acquired skill sets, with a focus on the challenge of avoiding forgetting past gathered knowledge and ensuring scalability with the growing number of tasks. Such issues prevail in autonomous robotics and video game simulations, notably for navigation tasks prone to topological or kinematic changes. To address these issues, we introduce HiSPO, a novel hierarchical framework designed specifically for continual learning in navigation settings from offline data. Our method leverages distinct policy subspaces of neural networks to enable flexible and efficient adaptation to new tasks while preserving existing knowledge. We demonstrate, through a careful experimental study, the effectiveness of our method in both classical MuJoCo maze environments and complex video game-like navigation simulations, showcasing competitive performances and satisfying adaptability with respect to classical continual learning metrics, in particular regarding the memory usage and efficiency.
翻译:本文研究持续强化学习设置,其中智能体需持续适应新任务同时保持已习得技能,重点关注避免遗忘历史知识并确保算法随任务数量增长的可扩展性。此类问题在自主机器人学与视频游戏仿真中普遍存在,尤其易受拓扑或运动学变化的导航任务。为解决这些问题,我们提出HiSPO——专为离线数据导航场景持续学习设计的新型层次化框架。该方法利用神经网络的不同策略子空间,在保持现有知识的同时实现对新任务的灵活高效适应。通过严谨的实验研究,我们在经典MuJoCo迷宫环境与复杂类视频游戏导航仿真中验证了方法的有效性,其在经典持续学习指标(特别是内存使用效率与适应性方面)展现出竞争优势与良好适应性。