Accurate reconstruction of both the geometric and topological details of a 3D object from a single 2D image embodies a fundamental challenge in computer vision. Existing explicit/implicit solutions to this problem struggle to recover self-occluded geometry and/or faithfully reconstruct topological shape structures. To resolve this dilemma, we introduce LIST, a novel neural architecture that leverages local and global image features to accurately reconstruct the geometric and topological structure of a 3D object from a single image. We utilize global 2D features to predict a coarse shape of the target object and then use it as a base for higher-resolution reconstruction. By leveraging both local 2D features from the image and 3D features from the coarse prediction, we can predict the signed distance between an arbitrary point and the target surface via an implicit predictor with great accuracy. Furthermore, our model does not require camera estimation or pixel alignment. It provides an uninfluenced reconstruction from the input-view direction. Through qualitative and quantitative analysis, we show the superiority of our model in reconstructing 3D objects from both synthetic and real-world images against the state of the art.
翻译:从单张二维图像精确重建三维物体的几何与拓扑细节是计算机视觉中的一项基本挑战。现有的显式/隐式解决方案在恢复自遮挡几何结构以及忠实重建拓扑形状结构方面存在困难。为解决这一难题,我们提出LIST——一种新颖的神经架构,利用局部和全局图像特征从单张图像精确重建三维物体的几何与拓扑结构。我们利用全局二维特征预测目标物体的粗粒度形状,并将其作为更高分辨率重建的基础。通过同时利用图像中的局部二维特征与粗预测中的三维特征,我们能够通过隐式预测器高精度地预测任意点与目标表面之间的符号距离。此外,我们的模型无需相机估计或像素对齐,可从输入视角方向提供不受干扰的重建。通过定性与定量分析,我们展示了模型在合成图像与真实世界图像的三维物体重建中相较于现有技术的优越性。