3D reconstruction aims to recover the dense 3D structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views of a scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environments. Due to its efficiency and effectiveness, MVS has become a pivotal method for image-based 3D reconstruction. Recently, with the success of deep learning, many learning-based MVS methods have been proposed, achieving impressive performance against traditional methods. We categorize these learning-based methods as: depth map-based, voxel-based, NeRF-based, 3D Gaussian Splatting-based, and large feed-forward methods. Among these, we focus significantly on depth map-based methods, which are the main family of MVS due to their conciseness, flexibility and scalability. In this survey, we provide a comprehensive review of the literature at the time of this writing. We investigate these learning-based methods, summarize their performances on popular benchmarks, and discuss promising future research directions in this area.
翻译:三维重建旨在恢复场景的密集三维结构,在增强现实/虚拟现实(AR/VR)、自动驾驶和机器人技术等多种应用中扮演着至关重要的角色。多视图立体视觉算法利用从不同视点捕获的场景多幅图像,合成全面的三维表示,从而能够在复杂环境中实现精确重建。由于其高效性和有效性,MVS已成为基于图像的三维重建的关键方法。近年来,随着深度学习的成功,许多基于学习的MVS方法被提出,相较于传统方法取得了令人瞩目的性能。我们将这些基于学习的方法归类为:基于深度图的方法、基于体素的方法、基于神经辐射场的方法、基于三维高斯泼溅的方法以及大型前馈方法。其中,我们重点聚焦于基于深度图的方法,因其简洁性、灵活性和可扩展性而成为MVS的主要技术路线。在本综述中,我们对截至撰写时的文献进行了全面回顾,深入探讨了这些基于学习的方法,总结了它们在主流基准测试上的性能表现,并讨论了该领域未来有前景的研究方向。