6D pose estimation of textureless shiny objects has become an essential problem in many robotic applications. Many pose estimators require high-quality depth data, often measured by structured light cameras. However, when objects have shiny surfaces (e.g., metal parts), these cameras fail to sense complete depths from a single viewpoint due to the specular reflection, resulting in a significant drop in the final pose accuracy. To mitigate this issue, we present a complete active vision framework for 6D object pose refinement and next-best-view prediction. Specifically, we first develop an optimization-based pose refinement module for the structured light camera. Our system then selects the next best camera viewpoint to collect depth measurements by minimizing the predicted uncertainty of the object pose. Compared to previous approaches, we additionally predict measurement uncertainties of future viewpoints by online rendering, which significantly improves the next-best-view prediction performance. We test our approach on the challenging real-world ROBI dataset. The results demonstrate that our pose refinement method outperforms the traditional ICP-based approach when given the same input depth data, and our next-best-view strategy can achieve high object pose accuracy with significantly fewer viewpoints than the heuristic-based policies.
翻译:无纹理反光物体的6D姿态估计已成为许多机器人应用中的关键问题。许多姿态估计器需要高质量深度数据,这类数据通常由结构光相机测量。然而,当物体具有反光表面(如金属零件)时,由于镜面反射效应,此类相机无法从单一视角获取完整深度信息,导致最终姿态精度大幅下降。为解决这一问题,我们提出了一种完整的主动视觉框架,用于6D物体姿态精化与最优视角预测。具体而言,我们首先开发了一个基于优化的姿态精化模块,专门适配结构光相机。随后,系统通过最小化物体姿态预测不确定性来选择下一个最优相机视角以采集深度测量值。与现有方法相比,我们通过在线渲染额外预测未来视角的测量不确定性,显著提升了最优视角预测性能。我们在具有挑战性的真实世界ROBI数据集上验证了该方法。结果表明:在输入相同深度数据时,我们的姿态精化方法优于传统基于ICP的方法;同时,与启发式策略相比,我们的最优视角策略能以更少的视角数量实现高精度物体姿态估计。