Surface reconstruction from multi-view images is a challenging task, with solutions often requiring a large number of sampled images with high overlap. We seek to develop a method for few-view reconstruction, for the case of the human foot. To solve this task, we must extract rich geometric cues from RGB images, before carefully fusing them into a final 3D object. Our FOUND approach tackles this, with 4 main contributions: (i) SynFoot, a synthetic dataset of 50,000 photorealistic foot images, paired with ground truth surface normals and keypoints; (ii) an uncertainty-aware surface normal predictor trained on our synthetic dataset; (iii) an optimization scheme for fitting a generative foot model to a series of images; and (iv) a benchmark dataset of calibrated images and high resolution ground truth geometry. We show that our normal predictor outperforms all off-the-shelf equivalents significantly on real images, and our optimization scheme outperforms state-of-the-art photogrammetry pipelines, especially for a few-view setting. We release our synthetic dataset and baseline 3D scans to the research community.
翻译:摘要:从多视角图像进行表面重建是一项具有挑战性的任务,现有解决方案通常需要采集大量高重叠率的样本图像。我们旨在针对人体足部这一特定场景,开发一种适用于少视角重建的方法。为解决该任务,我们必须从RGB图像中提取丰富的几何线索,再将其精细融合为最终的三维物体。我们的FOUND方法提出了四项核心贡献:(i)SynFoot——包含50,000张逼真足部图像的合成数据集,并配有真实表面法线和关键点标注;(ii)基于合成数据集训练的不确定性感知表面法线预测器;(iii)将生成式足部模型拟合至图像序列的优化方案;(iv)包含标定图像与高分辨率真实几何的基准数据集。实验表明,我们的法线预测器在实际图像上的表现显著优于所有现成替代方案,而优化方案尤其适用于少视角场景,性能超越现有最先进摄影测量管线。我们将向研究社区公开合成数据集与基准三维扫描数据。