Developing generalizable robot policies that can robustly handle varied environmental conditions and object instances remains a fundamental challenge in robot learning. While considerable efforts have focused on collecting large robot datasets and developing policy architectures to learn from such data, naively learning from visual inputs often results in brittle policies that fail to transfer beyond the training data. This work presents Prescriptive Point Priors for Policies or P3-PO, a novel framework that constructs a unique state representation of the environment leveraging recent advances in computer vision and robot learning to achieve improved out-of-distribution generalization for robot manipulation. This representation is obtained through two steps. First, a human annotator prescribes a set of semantically meaningful points on a single demonstration frame. These points are then propagated through the dataset using off-the-shelf vision models. The derived points serve as an input to state-of-the-art policy architectures for policy learning. Our experiments across four real-world tasks demonstrate an overall 43% absolute improvement over prior methods when evaluated in identical settings as training. Further, P3-PO exhibits 58% and 80% gains across tasks for new object instances and more cluttered environments respectively. Videos illustrating the robot's performance are best viewed at point-priors.github.io.
翻译:开发能够稳健处理多样化环境条件与物体实例的可泛化机器人策略,仍然是机器人学习中的一个根本性挑战。尽管已有大量工作聚焦于收集大规模机器人数据集并开发能够从这些数据中学习的策略架构,但直接从视觉输入中学习通常会导致策略脆弱,无法迁移到训练数据之外。本文提出用于策略的规定性点先验(Prescriptive Point Priors for Policies, P3-PO),这是一个新颖的框架,它利用计算机视觉和机器人学习领域的最新进展,构建一种独特的环境状态表示,以实现机器人操作任务中改进的分布外泛化。该表示通过两个步骤获得。首先,人工标注者在单个演示帧上规定一组具有语义意义的点。随后,利用现成的视觉模型将这些点在整个数据集中传播。推导出的点作为最先进策略架构的输入,用于策略学习。我们在四个真实世界任务上的实验表明,在与训练相同的设置下进行评估时,P3-PO相较于先前方法实现了平均43%的绝对性能提升。此外,在面对新物体实例和更杂乱环境时,P3-PO在不同任务上分别展现出58%和80%的性能增益。展示机器人性能的视频请访问 point-priors.github.io 以获得最佳观看效果。