In policy learning for robotic manipulation, sample efficiency is of paramount importance. Thus, learning and extracting more compact representations from camera observations is a promising avenue. However, current methods often assume full observability of the scene and struggle with scale invariance. In many tasks and settings, this assumption does not hold as objects in the scene are often occluded or lie outside the field of view of the camera, rendering the camera observation ambiguous with regard to their location. To tackle this problem, we present BASK, a Bayesian approach to tracking scale-invariant keypoints over time. Our approach successfully resolves inherent ambiguities in images, enabling keypoint tracking on symmetrical objects and occluded and out-of-view objects. We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques. Furthermore, we show outstanding robustness towards disturbances such as clutter, occlusions, and noisy depth measurements, as well as generalization to unseen objects both in simulation and real-world robotic experiments.
翻译:在机器人操作策略学习中,样本效率至关重要。因此,从相机观测中学习并提取更紧凑的表征是一种有前景的方法。然而,当前方法通常假设场景完全可观测,且难以处理尺度不变性。在许多任务和场景中,这一假设并不成立,因为场景中的物体常被遮挡或位于相机视野之外,导致相机观测对其位置存在歧义。为解决这一问题,我们提出BASK——一种随时间追踪尺度不变关键点的贝叶斯方法。我们的方法成功解决了图像中固有的歧义性,实现了对对称物体、被遮挡物体及视野外物体的关键点追踪。我们利用该方法从腕部相机观测中学习具有挑战性的多物体机器人操作任务,并证明其相较于其他表征学习技术在策略学习中的优越性。此外,在仿真和真实机器人实验中,我们展示了该方法对杂乱环境、遮挡、噪声深度测量等干扰的出色鲁棒性,以及对未见物体的泛化能力。