Reconstructing real-world 3D objects has numerous applications in computer vision, such as virtual reality, video games, and animations. Ideally, 3D reconstruction methods should generate high-fidelity results with 3D consistency in real-time. Traditional methods match pixels between images using photo-consistency constraints or learned features, while differentiable rendering methods like Neural Radiance Fields (NeRF) use differentiable volume rendering or surface-based representation to generate high-fidelity scenes. However, these methods require excessive runtime for rendering, making them impractical for daily applications. To address these challenges, we present $\textbf{EvaSurf}$, an $\textbf{E}$fficient $\textbf{V}$iew-$\textbf{A}$ware implicit textured $\textbf{Surf}$ace reconstruction method. In our method, we first employ an efficient surface-based model with a multi-view supervision module to ensure accurate mesh reconstruction. To enable high-fidelity rendering, we learn an implicit texture embedded with view-aware encoding to capture view-dependent information. Furthermore, with the explicit geometry and the implicit texture, we can employ a lightweight neural shader to reduce the expense of computation and further support real-time rendering on common mobile devices. Extensive experiments demonstrate that our method can reconstruct high-quality appearance and accurate mesh on both synthetic and real-world datasets. Moreover, our method can be trained in just 1-2 hours using a single GPU and run on mobile devices at over 40 FPS (Frames Per Second), with a final package required for rendering taking up only 40-50 MB.
翻译:真实世界三维物体的重建在计算机视觉领域具有广泛应用,例如虚拟现实、视频游戏和动画制作。理想情况下,三维重建方法应能实时生成具有三维一致性的高保真结果。传统方法通过光度一致性约束或学习特征在图像间匹配像素,而如神经辐射场(NeRF)等可微分渲染方法则利用可微分体渲染或基于表面的表示来生成高保真场景。然而,这些方法在渲染时需要过长的运行时间,使其难以应用于日常场景。为应对这些挑战,我们提出 $\textbf{EvaSurf}$,一种$\textbf{高}$效$\textbf{视}$角$\textbf{感}$知的隐式纹理$\textbf{表面}$重建方法。在我们的方法中,首先采用基于表面的高效模型配合多视角监督模块,以确保精确的网格重建。为实现高保真渲染,我们学习嵌入视角感知编码的隐式纹理以捕捉视角相关信息。此外,借助显式几何与隐式纹理,我们可以采用轻量级神经着色器来降低计算开销,并进一步支持在常见移动设备上的实时渲染。大量实验表明,我们的方法能够在合成数据集和真实数据集上重建高质量外观与精确网格。此外,该方法仅需单GPU训练1-2小时,并在移动设备上以超过40 FPS(每秒帧数)运行,最终渲染所需封装文件仅占40-50 MB。