Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). When deploying the trained policy to a new task or a new environment, we first let the policy play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to provide feedback to relabel the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show PAFF improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/PAFF/
翻译:摘要:近期视觉-语言基础模型的进展极大地推动了通用机器人的构建。通过使用预训练模型将场景和指令编码为决策输入,基于指令的策略能够在不同物体和任务间实现泛化。尽管这一进展令人鼓舞,但在面对未见任务或环境时,该策略在多数情况下仍会失败。本文提出了一种基于基础模型反馈的策略自适应方法(PAFF)。当将训练后的策略部署到新任务或新环境时,我们首先让策略通过随机生成的指令进行交互以记录演示数据。尽管执行结果可能出错,但可利用预训练基础模型提供反馈来重新标注这些演示,从而自动生成新的演示-指令数据对用于策略微调。我们在涉及未见物体、未见任务、未见环境以及仿真到现实迁移等广泛实验场景中评估了该方法,结果表明PAFF在所有情况下均大幅提升了基线性能。项目页面详见https://geyuying.github.io/PAFF/