In this work we propose a novel, highly practical, binocular photometric stereo (PS) framework, which has same acquisition speed as single view PS, however significantly improves the quality of the estimated geometry. As in recent neural multi-view shape estimation frameworks such as NeRF, SIREN and inverse graphics approaches to multi-view photometric stereo (e.g. PS-NeRF) we formulate shape estimation task as learning of a differentiable surface and texture representation by minimising surface normal discrepancy for normals estimated from multiple varying light images for two views as well as discrepancy between rendered surface intensity and observed images. Our method differs from typical multi-view shape estimation approaches in two key ways. First, our surface is represented not as a volume but as a neural heightmap where heights of points on a surface are computed by a deep neural network. Second, instead of predicting an average intensity as PS-NeRF or introducing lambertian material assumptions as Guo et al., we use a learnt BRDF and perform near-field per point intensity rendering. Our method achieves the state-of-the-art performance on the DiLiGenT-MV dataset adapted to binocular stereo setup as well as a new binocular photometric stereo dataset - LUCES-ST.
翻译:本文提出了一种新颖且高度实用的双目光度立体(PS)框架,其采集速度与单视角PS相同,但显著提升了估计几何质量。受近期神经多视角形状估计框架(如NeRF、SIREN)以及多视角光度立体的逆图形方法(如PS-NeRF)启发,我们将形状估计任务转化为可微表面与纹理表示的学习问题,通过最小化两视角下多光照图像估计法向量的表面法向差异,以及渲染表面强度与观测图像之间的差异来实现。本方法与典型多视角形状估计方法存在两点关键差异:首先,表面表示并非基于体素,而是采用神经高度图——由深度神经网络计算表面点的高度;其次,不同于PS-NeRF预测平均强度或Guo等引入朗伯材质假设,我们采用学习得到的BRDF,并针对近场逐点强度进行渲染。本方法在适用于双目立体设置的DiLiGenT-MV数据集以及新构建的双目光度立体数据集LUCES-ST上均达到了当前最优性能。