Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations and scale to realistic tasks due to complex architecture design or strong regularization that limits the capacity of the learned policy. We propose Policy Stitching, a novel framework that facilitates robot transfer learning for novel combinations of robots and tasks. Our key idea is to apply modular policy design and align the latent representations between the modular interfaces. Our method allows direct stitching of the robot and task modules trained separately to form a new policy for fast adaptation. Our simulated and real-world experiments on various 3D manipulation tasks demonstrate the superior zero-shot and few-shot transfer learning performances of our method. Our project website is at: http://generalroboticslab.com/PolicyStitching/ .
翻译:使用强化学习训练机器人通常需要与环境进行大量交互,并且所学技能往往对任务环境变化和机器人运动学参数敏感。迁移强化学习旨在利用先前知识加速新任务或新身体构型的学习。然而,现有方法由于采用复杂的架构设计或过于强烈的正则化限制了所学策略的表征能力,难以泛化到新型机器人-任务组合,且难以扩展至现实任务。我们提出策略拼接(Policy Stitching)这一新框架,能够促进面向新型机器人-任务组合的机器人迁移学习。其核心思想是采用模块化策略设计,并对齐模块接口间的潜在表征。该方法允许直接将分别训练的机器人模块与任务模块拼接,形成用于快速适应的新策略。我们在多种三维操作任务上进行的仿真与真实世界实验表明,该方法在零样本与少样本迁移学习任务中均展现出卓越性能。项目网站:http://generalroboticslab.com/PolicyStitching/