The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method that grows adaptively depending on the task sequence. We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. The subspace's high expressivity allows CSP to perform well for many different tasks while growing sublinearly with the number of tasks. Our method does not suffer from forgetting and displays positive transfer to new tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation).
翻译:持续获取新知识新技能是自主智能体的核心能力。现有方法通常基于两类模型:固定尺寸模型难以学习大量多样化行为,而增长尺寸模型随任务数量扩展性差。本研究旨在通过设计一种根据任务序列自适应增长的模型构建方法,在智能体规模与性能之间取得更优平衡。我们提出策略持续子空间(CSP)这一新方法,通过增量构建策略子空间来训练强化学习智能体处理连续任务序列。该子空间的高表达能力使CSP能够出色应对多种不同任务,同时其规模仅以亚线性速度随任务数量增长。本方法不仅避免灾难性遗忘,还能对后续任务产生正向迁移。在Brax(运动控制)和Continual World(操作控制)两个具有挑战性的领域中的多种场景下,CSP均优于多个主流基线方法。