Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-language model. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 60.36%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 44.31% and rollout step-length by 11.15%, respectively. All code, models, and data are available at https://grape-vla.github.io/
翻译:尽管视觉-语言-动作(VLA)模型在多种机器人任务上取得了最新进展,但由于其完全依赖于对成功轨迹的行为克隆,它们在面对未见任务时泛化能力较差。此外,这些模型通常通过微调来模仿专家在不同设置下收集的演示数据,从而引入了分布偏差,并限制了其对多样化操作目标(如效率、安全性和任务完成度)的适应能力。为弥补这一差距,我们提出了GRAPE:通过偏好对齐实现机器人策略泛化。具体而言,GRAPE在轨迹层面进行VLA对齐,并隐式地从成功和失败试验中建模奖励,以提升对多样化任务的泛化能力。此外,GRAPE将复杂操作任务分解为独立的阶段,并通过大型视觉语言模型提出的关键点,利用定制的时空约束自动引导偏好建模。值得注意的是,这些约束具有灵活性,可以定制以将模型与不同的目标(如安全性、效率或任务成功)对齐。我们在真实世界和模拟环境中的多种任务上评估GRAPE。实验结果表明,GRAPE提升了最先进VLA模型的性能,将领域内和未见操作任务的成功率分别提高了51.79%和60.36%。此外,GRAPE可以与各种目标(如安全性和效率)对齐,分别将碰撞率降低了44.31%,将轨迹步长减少了11.15%。所有代码、模型和数据均可在 https://grape-vla.github.io/ 获取。