基于检索增强先验的鲁棒贝叶斯场景重建 (Robust Bayesian Scene Reconstruction by Leveraging Retrieval-Augmented Priors)

Constructing 3D representations of object geometry is critical for many downstream robotics tasks, particularly tabletop manipulation problems. These representations must be built from potentially noisy partial observations. In this work, we focus on the problem of reconstructing a multi-object scene from a single RGBD image, generally from a fixed camera in the scene. Traditional scene representation methods generally cannot infer the geometry of unobserved regions of the objects from the image. Attempts have been made to leverage deep learning to train on a dataset of observed objects and representations, and then generalize to new observations. However, this can be brittle to noisy real-world observations and objects not contained in the dataset, and cannot reason about their confidence. We propose BRRP, a reconstruction method that leverages preexisting mesh datasets to build an informative prior during robust probabilistic reconstruction. In order to make our method more efficient, we introduce the concept of retrieval-augmented prior, where we retrieve relevant components of our prior distribution during inference. The prior is used to estimate the geometry of occluded portions of the in-scene objects. Our method produces a distribution over object shape that can be used for reconstruction or measuring uncertainty. We evaluate our method in both simulated scenes and in the real world. We demonstrate the robustness of our method against deep learning-only approaches while being more accurate than a method without an informative prior.

翻译：构建物体几何的三维表示对于许多下游机器人任务（特别是桌面操作问题）至关重要。这些表示必须从可能存在噪声的局部观测中构建。在本研究中，我们关注从单幅RGBD图像重建多物体场景的问题，该图像通常来自场景中的固定相机。传统场景表示方法通常无法从图像推断物体未观测区域的几何结构。已有研究尝试利用深度学习在观测物体及其表示的数据集上进行训练，然后推广到新的观测。然而，这种方法对现实世界中存在噪声的观测以及数据集中未包含的物体较为脆弱，且无法对其置信度进行推理。我们提出BRRP方法，该重建方法利用现有网格数据集在鲁棒概率重建过程中构建信息丰富的先验分布。为提高方法效率，我们引入检索增强先验的概念，在推理过程中检索先验分布的相关组成部分。该先验用于估计场景内物体被遮挡部分的几何形状。我们的方法生成物体形状的概率分布，可用于重建或测量不确定性。我们在仿真场景和现实世界中对方法进行评估，结果表明：相较于纯深度学习方法，我们的方法具有更强的鲁棒性；相较于无信息先验的方法，我们的方法具有更高的准确性。