In dynamic domains such as autonomous robotics and video game simulations, agents must continuously adapt to new tasks while retaining previously acquired skills. This ongoing process, known as Continual Reinforcement Learning, presents significant challenges, including the risk of forgetting past knowledge and the need for scalable solutions as the number of tasks increases. To address these issues, we introduce HIerarchical LOW-rank Subspaces of Policies (HILOW), a novel framework designed for continual learning in offline navigation settings. HILOW leverages hierarchical policy subspaces to enable flexible and efficient adaptation to new tasks while preserving existing knowledge. We demonstrate, through a careful experimental study, the effectiveness of our method in both classical MuJoCo maze environments and complex video game-like simulations, showcasing competitive performance and satisfying adaptability according to classical continual learning metrics, in particular regarding memory usage. Our work provides a promising framework for real-world applications where continuous learning from pre-collected data is essential.
翻译:在诸如自主机器人和视频游戏模拟等动态领域中,智能体必须持续适应新任务,同时保持先前习得的技能。这一持续过程被称为持续强化学习,它带来了重大挑战,包括遗忘过去知识的风险以及随着任务数量增加对可扩展解决方案的需求。为解决这些问题,我们提出了HIerarchical LOW-rank Subspaces of Policies (HILOW),这是一个专为离线导航场景中持续学习设计的新颖框架。HILOW利用分层策略子空间,以实现对新任务的灵活高效适应,同时保留现有知识。我们通过细致的实验研究,在经典MuJoCo迷宫环境和复杂的类视频游戏模拟中,证明了我们方法的有效性,展示了其在经典持续学习指标(尤其是内存使用方面)上的竞争性性能和令人满意的适应性。我们的工作为现实世界应用提供了一个有前景的框架,在这些应用中,从预先收集的数据中进行持续学习至关重要。