The ability to construct concise scene representations from sensor input is central to the field of robotics. This paper addresses the problem of robustly creating a 3D representation of a tabletop scene from a segmented RGB-D image. These representations are then critical for a range of downstream manipulation tasks. Many previous attempts to tackle this problem do not capture accurate uncertainty, which is required to subsequently produce safe motion plans. In this paper, we cast the representation of 3D tabletop scenes as a multi-class classification problem. To tackle this, we introduce \ourmethod{}, a framework and method for robustly creating probabilistic 3D segmentation maps of tabletop scenes. Our maps contain both occupancy estimates, segmentation information, and principled uncertainty measures. We evaluate the robustness of our method in (1) procedurally generated scenes using open-source object datasets, and (2) real-world tabletop data collected from a depth camera. Our experiments show that our approach outperforms alternative continuous reconstruction approaches that do not explicitly reason about objects in a multi-class formulation.
翻译:从传感器输入构建简洁场景表示的能力是机器人领域的核心问题。本文研究如何从分割后的RGB-D图像中鲁棒地生成桌面场景的三维表示。这些表示对于后续一系列操作任务至关重要。以往许多解决该问题的方法未能准确捕捉不确定性,而这是生成安全运动规划的必要前提。本文将三维桌面场景的表示转化为多类分类问题。为此,我们提出\ourmethod{}——一种鲁棒生成桌面场景概率三维分割图的框架与方法。该图包含占用率估计、分割信息以及具有理论依据的不确定性度量。我们在以下两种场景中评估了方法的鲁棒性:(1) 使用开源物体数据集程序化生成的场景;(2) 从深度相机采集的真实桌面数据。实验表明,我们的方法优于未在多类框架中显式推理物体的替代性连续重建方法。