Reconstructing high-fidelity 3D head geometry from images is critical for a wide range of applications, yet existing methods face fundamental limitations. Traditional photogrammetry achieves exceptional detail but requires extensive camera arrays (25-200+ views), substantial computation, and manual cleanup in challenging areas like facial hair. Recent alternatives present a fundamental trade-off: foundation models enable efficient single-image reconstruction but lack fine geometric detail, while optimization-based methods achieve higher fidelity but require dense views and expensive computation. We bridge this gap with a hybrid approach that combines the strengths of both paradigms. Our method introduces a multi-view surface normal prediction model that extends monocular foundation models with cross-view attention to produce geometrically consistent normals in a feed-forward pass. We then leverage these predictions as strong geometric priors within an inverse rendering optimization framework to recover high-frequency surface details. Our approach outperforms state-of-the-art single-image and multi-view methods, achieving high-fidelity reconstruction on par with dense-view photogrammetry while reducing camera requirements and computational cost. The code and model will be released.
翻译:从图像重建高保真三维头部几何对众多应用至关重要,但现有方法存在根本性局限。传统摄影测量技术虽能实现卓越细节,但需庞大相机阵列(25-200+视角)、大量计算,且在面部毛发等复杂区域需人工清理。近期替代方案呈现本质性权衡:基础模型支持高效单图像重建但缺乏精细几何细节,而基于优化的方法虽能获得更高保真度,却需要密集视角与昂贵计算。我们通过融合两种范式优势的混合方法弥合这一差距。本方法提出多视角表面法向预测模型,该模型通过跨视角注意力机制扩展单目基础模型,以前馈方式生成几何一致的法向图。随后在逆向渲染优化框架中将这些预测作为强几何先验,以恢复高频表面细节。本方法在单图像与多视角重建任务上均超越现有最优技术,在达到与密集视角摄影测量相当的高保真重建效果的同时,显著降低了相机数量需求与计算成本。代码与模型将公开。