Recent advances in robotic foundation models have enabled the development of generalist policies that can adapt to diverse tasks. While these models show impressive flexibility, their performance heavily depends on the quality of their training data. In this work, we propose Reinforcement Learning Distilled Generalists (RLDG), a method that leverages reinforcement learning to generate high-quality training data for finetuning generalist policies. Through extensive real-world experiments on precise manipulation tasks like connector insertion and assembly, we demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations, achieving up to 40% higher success rates while generalizing better to new tasks. We also provide a detailed analysis that reveals this performance gain stems from both optimized action distributions and improved state coverage. Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems that maintain the flexibility of foundation models while achieving the performance of specialized controllers. Videos and code can be found on our project website https://generalist-distillation.github.io
翻译:近年来,机器人基础模型的发展推动了能够适应多样化任务的通用策略的开发。尽管这些模型展现出令人印象深刻的灵活性,但其性能在很大程度上依赖于训练数据的质量。本文提出强化学习蒸馏通用策略(RLDG),该方法利用强化学习生成高质量的训练数据,用于微调通用策略。通过在连接器插入和装配等精确操作任务上进行大量真实世界实验,我们证明使用强化学习生成数据训练的通用策略,其性能始终优于使用人类演示数据训练的通用策略,在泛化到新任务时成功率提升高达40%。我们还提供了详细分析,揭示这一性能提升源于优化的动作分布和改善的状态覆盖。我们的结果表明,将任务特定的强化学习与通用策略蒸馏相结合,为开发更强大、更高效的机器人操作系统提供了一条有前景的路径,该系统既能保持基础模型的灵活性,又能达到专用控制器的性能水平。视频和代码可在我们的项目网站 https://generalist-distillation.github.io 上找到。