We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation loss transfers behaviors from these auxiliary tasks to solve the main task. We demonstrate that AuxDistill can learn a pixels-to-actions policy for a challenging multi-stage embodied object rearrangement task from the environment reward without demonstrations, a learning curriculum, or pre-trained skills. AuxDistill achieves $2.3 \times$ higher success than the previous state-of-the-art baseline in the Habitat Object Rearrangement benchmark and outperforms methods that use pre-trained skills and expert demonstrations.
翻译:本文提出了一种名为辅助任务蒸馏(AuxDistill)的新方法,该方法通过从辅助强化学习任务中蒸馏行为,使强化学习能够处理长时程机器人控制问题。AuxDistill 通过并行执行多任务强化学习来实现这一目标,其中辅助任务更易于学习且与主任务相关。通过加权蒸馏损失,将辅助任务中的行为迁移至主任务以解决该任务。我们证明,AuxDistill 能够仅凭环境奖励学习从像素到动作的策略,以完成具有挑战性的多阶段具身物体重排任务,而无需演示数据、学习课程或预训练技能。在 Habitat 物体重排基准测试中,AuxDistill 的成功率比先前最先进的基线方法提高了 $2.3 \times$,并且优于使用预训练技能和专家演示的方法。