3D object reconstruction is important for semantic scene understanding. It is challenging to reconstruct detailed 3D shapes from monocular images directly due to a lack of depth information, occlusion and noise. Most current methods generate deterministic object models without any awareness of the uncertainty of the reconstruction. We tackle this problem by leveraging a neural object representation which learns an object shape distribution from large dataset of 3d object models and maps it into a latent space. We propose a method to model uncertainty as part of the representation and define an uncertainty-aware encoder which generates latent codes with uncertainty directly from individual input images. Further, we propose a method to propagate the uncertainty in the latent code to SDF values and generate a 3d object mesh with local uncertainty for each mesh component. Finally, we propose an incremental fusion method under a Bayesian framework to fuse the latent codes from multi-view observations. We evaluate the system in both synthetic and real datasets to demonstrate the effectiveness of uncertainty-based fusion to improve 3D object reconstruction accuracy.
翻译:三维物体重建对于语义场景理解至关重要。由于缺乏深度信息、存在遮挡和噪声,直接从单目图像重建详细的三维形状极具挑战性。当前大多数方法生成确定性的物体模型,而无法感知重建的不确定性。我们通过利用一种神经物体表征来解决这一问题,该表征从大规模三维物体模型数据集中学习物体形状分布,并将其映射到潜在空间。我们提出了一种将不确定性建模为表征组成部分的方法,并定义了不确定性感知编码器,该编码器可直接从单张输入图像生成带有不确定性的潜在编码。此外,我们提出了一种将潜在编码中的不确定性传播至符号距离函数(SDF)值的方法,并为每个网格组件生成带有局部不确定性的三维物体网格。最后,我们在贝叶斯框架下提出了一种增量融合方法,用于融合多视角观测的潜在编码。我们在合成数据集和真实数据集上评估了该系统,以证明基于不确定性的融合在提升三维物体重建精度方面的有效性。