Object pose estimation is a core computer vision problem and often an essential component in robotics. Pose estimation is usually approached by seeking the single best estimate of an object's pose, but this approach is ill-suited for tasks involving visual ambiguity. In such cases it is desirable to estimate the uncertainty as a pose distribution to allow downstream tasks to make informed decisions. Pose distributions can have arbitrary complexity which motivates estimating unparameterized distributions, however, until now they have only been used for orientation estimation on SO(3) due to the difficulty in training on and normalizing over SE(3). We propose a novel method for pose distribution estimation on SE(3). We use a hierarchical grid, a pyramid, which enables efficient importance sampling during training and sparse evaluation of the pyramid at inference, allowing real time 6D pose distribution estimation. Our method outperforms state-of-the-art methods on SO(3), and to the best of our knowledge, we provide the first quantitative results on pose distribution estimation on SE(3). Code will be available at spyropose.github.io
翻译:物体位姿估计是计算机视觉的核心问题,也是机器人领域的常见关键组成部分。通常,位姿估计通过寻找物体位姿的最优单一估计来实现,但这种方法难以处理涉及视觉歧义的任务。在此类场景下,需要将不确定性估计为位姿分布,以便下游任务做出明智决策。位姿分布具有任意复杂度,这促使我们采用非参数化分布估计方法。然而,由于在SE(3)上训练和归一化的困难,现有方法仅能用于SO(3)上的姿态估计。我们提出了一种在SE(3)上进行位姿分布估计的新方法。该方法采用分层网格(即金字塔)结构,能够在训练过程中实现高效的重要性采样,并在推理阶段对金字塔进行稀疏评估,从而实现实时6D位姿分布估计。我们的方法在SO(3)上优于现有最优方法,并且据我们所知,首次提供了SE(3)上位姿分布估计的定量结果。代码将开源至spyropose.github.io。