We propose a fast and generalizable solution to Multi-view Photometric Stereo (MVPS), called MVPSNet. The key to our approach is a feature extraction network that effectively combines images from the same view captured under multiple lighting conditions to extract geometric features from shading cues for stereo matching. We demonstrate these features, termed `Light Aggregated Feature Maps' (LAFM), are effective for feature matching even in textureless regions, where traditional multi-view stereo methods fail. Our method produces similar reconstruction results to PS-NeRF, a state-of-the-art MVPS method that optimizes a neural network per-scene, while being 411$\times$ faster (105 seconds vs. 12 hours) in inference. Additionally, we introduce a new synthetic dataset for MVPS, sMVPS, which is shown to be effective to train a generalizable MVPS method.
翻译:我们提出了一种快速且通用的多视角光度立体视觉(MVPS)方法,称为MVPSNet。该方法的核心是一种特征提取网络,该网络能有效结合同一视角下不同光照条件采集的图像,从阴影线索中提取几何特征以进行立体匹配。我们证明这些被称为"光照聚合特征图"(LAFM)的特征即使在传统多视角立体方法失效的无纹理区域,也能实现有效的特征匹配。与当前最先进的需要针对每个场景优化神经网络的MVPS方法PS-NeRF相比,本方法在产生相似重建结果的同时,推理速度提升了411倍(105秒 vs 12小时)。此外,我们引入了一个新的MVPS合成数据集sMVPS,实验证明该数据集能有效训练通用型MVPS方法。