Reconstructing three-dimensional (3D) scenes with semantic understanding is vital in many robotic applications. Robots need to identify which objects, along with their positions and shapes, to manipulate them precisely with given tasks. Mobile robots, especially, usually use lightweight networks to segment objects on RGB images and then localize them via depth maps; however, they often encounter out-of-distribution scenarios where masks over-cover the objects. In this paper, we address the problem of panoptic segmentation quality in 3D scene reconstruction by refining segmentation errors using non-parametric statistical methods. To enhance mask precision, we map the predicted masks into a depth frame to estimate their distribution via kernel densities. The outliers in depth perception are then rejected without the need for additional parameters in an adaptive manner to out-of-distribution scenarios, followed by 3D reconstruction using projective signed distance functions (SDFs). We validate our method on a synthetic dataset, which shows improvements in both quantitative and qualitative results for panoptic mapping. Through real-world testing, the results furthermore show our method's capability to be deployed on a real-robot system. Our source code is available at: https://github.com/mkhangg/refined panoptic mapping.
翻译:在众多机器人应用中,具备语义理解的三维场景重建至关重要。机器人需要识别物体及其位置与形状,以便根据给定任务精确操控它们。移动机器人尤其如此,其通常采用轻量级网络在RGB图像上进行物体分割,随后通过深度图进行定位;然而,它们常会遇到分布外场景,导致掩码过度覆盖物体。本文通过使用非参数统计方法修正分割误差,解决了三维场景重建中的全景分割质量问题。为提升掩码精度,我们将预测掩码映射至深度帧中,通过核密度估计其分布。随后,以自适应于分布外场景的方式,无需额外参数即可剔除深度感知中的异常值,并利用投影符号距离函数进行三维重建。我们在合成数据集上验证了所提方法,结果显示其在全景建图的定量与定性结果上均有提升。通过真实场景测试,结果进一步表明该方法具备部署于真实机器人系统的能力。我们的源代码公开于:https://github.com/mkhangg/refined panoptic mapping。