In policy learning for robotic manipulation, sample efficiency is of paramount importance. Thus, learning and extracting more compact representations from camera observations is a promising avenue. However, current methods often assume full observability of the scene and struggle with scale invariance. In many tasks and settings, this assumption does not hold as objects in the scene are often occluded or lie outside the field of view of the camera, rendering the camera observation ambiguous with regard to their location. To tackle this problem, we present BASK, a Bayesian approach to tracking scale-invariant keypoints over time. Our approach successfully resolves inherent ambiguities in images, enabling keypoint tracking on symmetrical objects and occluded and out-of-view objects. We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques. Furthermore, we show outstanding robustness towards disturbances such as clutter, occlusions, and noisy depth measurements, as well as generalization to unseen objects both in simulation and real-world robotic experiments.
翻译:在机器人操作策略学习中,样本效率至关重要。因此,从相机观测中学习并提取更紧凑的表示是一个有前景的方向。然而,当前方法通常假设场景完全可观测,并且难以应对尺度不变性。在许多任务和场景中,这一假设并不成立,因为场景中的物体常常被遮挡或位于相机视野之外,导致相机观测对其位置存在固有的歧义性。为解决这一问题,我们提出了BASK——一种随时间跟踪尺度不变关键点的贝叶斯方法。我们的方法成功解决了图像中固有的歧义性,能够对对称物体、被遮挡物体以及视野外物体进行关键点跟踪。我们采用该方法从腕部相机观测中学习具有挑战性的多物体机器人操作任务,并证明其相比其他表示学习技术在策略学习方面具有更优越的实用性。此外,我们展示了该方法对杂乱环境、遮挡、噪声深度测量等干扰的卓越鲁棒性,以及在仿真和真实机器人实验中向未见物体的泛化能力。