Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $\pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $\pi_0$ by 8.9% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
翻译:机器人基础模型,或称通用机器人策略,具有巨大潜力,能够实现灵活、通用且灵巧的机器人系统。尽管取得了进展,但我们的实证实验表明,现有机器人策略容易从预训练轨迹中学习到虚假相关性,从而对其在训练数据之外的泛化能力产生不利影响。为解决这一问题,我们提出了一种新颖的策略对比解码方法,该方法通过对比源自原始视觉输入和物体掩蔽视觉输入的动作概率分布,将机器人策略的关注点重新导向与物体相关的视觉线索。作为一种免训练方法,我们的PCD可用作插件来改进不同类型的机器人策略,而无需微调或访问模型权重。我们在三种开源机器人策略(包括自回归策略OpenVLA和基于扩散的策略Octo与$\pi_0$)之上进行了大量实验。在仿真和真实环境中所获得的结果证明了PCD的灵活性和有效性,例如,PCD将最先进的策略$\pi_0$在仿真环境中的性能提升了8.9%,在真实环境中的性能提升了108%。代码和演示可在以下网址公开获取:https://Koorye.github.io/proj/PCD。