We present Robot-centric Pooling (RcP), a novel pooling method designed to enhance end-to-end visuomotor policies by enabling differentiation between the robots and similar entities or their surroundings. Given an image-proprioception pair, RcP guides the aggregation of image features by highlighting image regions correlating with the robot's proprioceptive states, thereby extracting robot-centric image representations for policy learning. Leveraging contrastive learning techniques, RcP integrates seamlessly with existing visuomotor policy learning frameworks and is trained jointly with the policy using the same dataset, requiring no extra data collection involving self-distractors. We evaluate the proposed method with reaching tasks in both simulated and real-world settings. The results demonstrate that RcP significantly enhances the policies' robustness against various unseen distractors, including self-distractors, positioned at different locations. Additionally, the inherent robot-centric characteristic of RcP enables the learnt policy to be far more resilient to aggressive pixel shifts compared to the baselines.
翻译:本文提出机器人中心池化(RcP),一种新颖的池化方法,旨在通过区分机器人自身与相似实体或其周围环境来增强端到端视觉运动策略。给定图像-本体感觉对,RcP通过突出与机器人本体感觉状态相关的图像区域来引导图像特征的聚合,从而为策略学习提取机器人中心的图像表示。利用对比学习技术,RcP能够与现有视觉运动策略学习框架无缝集成,并与策略使用相同数据集进行联合训练,无需涉及自我干扰物的额外数据收集。我们在仿真和真实环境中的抓取任务上评估了所提出的方法。结果表明,RcP显著增强了策略针对位于不同位置的各种未见干扰物(包括自我干扰物)的鲁棒性。此外,RcP固有的机器人中心特性使得学习到的策略相比基线方法对剧烈像素偏移具有更强的适应能力。