In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner, our method simultaneously considers all the source images. Specifically, we introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information within and across multi-view images. Considering the asymmetry of the epipolar disparity flow, the key to our method lies in accurately modeling multi-view geometric constraints. We integrate pose embedding to encapsulate information such as multi-view camera poses, providing implicit geometric constraints for multi-view disparity feature fusion dominated by attention. Additionally, we construct corresponding hidden states for each source image due to significant differences in the observation quality of the same pixel in the reference frame across multiple source frames. We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image and dynamically update hidden states through the uncertainty estimation module. Extensive results on the DTU dataset and Tanks&Temple benchmark demonstrate the effectiveness of our method. The code is available at our project page: https://zju3dv.github.io/GD-PoseMVS/.
翻译:本文提出了一种新颖的多视图立体(MVS)框架,该框架摆脱了深度范围先验。与近期以成对方式工作的无先验MVS方法不同,我们的方法同时考虑所有源图像。具体而言,我们引入了一个多视图视差注意力(MDA)模块,用于聚合多视图图像内部及之间的长程上下文信息。考虑到极线视差流的不对称性,我们方法的关键在于精确建模多视图几何约束。我们集成了姿态嵌入,以封装多视图相机姿态等信息,为注意力主导的多视图视差特征融合提供隐式几何约束。此外,由于参考帧中同一像素在多个源帧中的观测质量存在显著差异,我们为每个源图像构建了相应的隐藏状态。我们显式地估计当前像素对应于源图像极线上采样点的质量,并通过不确定性估计模块动态更新隐藏状态。在DTU数据集和Tanks&Temple基准测试上的大量结果证明了我们方法的有效性。代码可在我们的项目页面获取:https://zju3dv.github.io/GD-PoseMVS/。