The ability to construct concise scene representations from sensor input is central to the field of robotics. This paper addresses the problem of robustly creating a 3D representation of a tabletop scene from a segmented RGB-D image. These representations are then critical for a range of downstream manipulation tasks. Many previous attempts to tackle this problem do not capture accurate uncertainty, which is required to subsequently produce safe motion plans. In this paper, we cast the representation of 3D tabletop scenes as a multi-class classification problem. To tackle this, we introduce V-PRISM, a framework and method for robustly creating probabilistic 3D segmentation maps of tabletop scenes. Our maps contain both occupancy estimates, segmentation information, and principled uncertainty measures. We evaluate the robustness of our method in (1) procedurally generated scenes using open-source object datasets, and (2) real-world tabletop data collected from a depth camera. Our experiments show that our approach outperforms alternative continuous reconstruction approaches that do not explicitly reason about objects in a multi-class formulation.
翻译:从传感器输入中构建简洁的场景表示是机器人领域的核心问题。本文针对如何从分割后的RGB-D图像稳健地构建桌面场景的三维表示展开研究。这些表示对于后续的一系列操作任务至关重要。以往许多解决该问题的尝试未能准确捕捉不确定性,而这一特性对于后续生成安全的运动规划不可或缺。本文将三维桌面场景的表示转化为多类分类问题。为此,我们提出V-PRISM——一种用于稳健构建桌面场景概率三维分割地图的框架与方法。我们的地图包含占据估计、分割信息以及具有理论依据的不确定性度量。我们在(1)使用开源物体数据集程序生成的场景及(2)从深度相机采集的真实桌面数据中评估了该方法的稳健性。实验表明,我们的方法优于未采用多类公式显式推理物体的替代性连续重建方法。