Relying on large language models (LLMs), embodied robots could perform complex multimodal robot manipulation tasks from visual observations with powerful generalization ability. However, most visual behavior-cloning agents suffer from manipulation performance degradation and skill knowledge forgetting when adapting into a series of challenging unseen tasks. We here investigate the above challenge with NBCagent in embodied robots, a pioneering language-conditioned Never-ending Behavior-Cloning agent, which can continually learn observation knowledge of novel robot manipulation skills from skill-specific and skill-shared attributes. Specifically, we establish a skill-specific evolving planner to perform knowledge decoupling, which can continually embed novel skill-specific knowledge in our NBCagent agent from latent and low-rank space. Meanwhile, we propose a skill-shared semantics rendering module and a skill-shared representation distillation module to effectively transfer anti-forgetting skill-shared knowledge, further tackling catastrophic forgetting on old skills from semantics and representation aspects. Finally, we design a continual embodied robot manipulation benchmark, and several expensive experiments demonstrate the significant performance of our method. Visual results, code, and dataset are provided at: https://neragent.github.io.
翻译:依赖大规模语言模型,具身机器人能够基于视觉观测执行复杂的多模态机器人操作任务,并具备强大的泛化能力。然而,大多数视觉行为克隆智能体在适应一系列具有挑战性的新任务时,会遭遇操作性能下降和技能知识遗忘问题。本文针对具身机器人中的上述挑战,提出了一种创新的语言条件化永不终结行为克隆智能体NBCagent,该智能体能够从技能特定和技能共享属性中持续学习新机器人操作技能的观测知识。具体而言,我们建立了技能特定演化规划器进行知识解耦,该规划器能够从潜在空间和低秩空间中持续嵌入新的技能特定知识到NBCagent中。同时,我们提出技能共享语义渲染模块和技能共享表征蒸馏模块,有效迁移抗遗忘的技能共享知识,进一步从语义和表征层面解决旧技能的灾难性遗忘问题。最后,我们设计了一个持续具身机器人操作基准,大量实验证明了我们方法的显著性能。视觉结果、代码和数据集可在以下网址获取:https://neragent.github.io。