Pairwise matching cost aggregation is a crucial step for modern learning-based Multi-view Stereo (MVS). Prior works adopt an early aggregation scheme, which adds up pairwise costs into an intermediate cost. However, we analyze that this process can degrade informative pairwise matchings, thereby blocking the depth network from fully utilizing the original geometric matching cues.To address this challenge, we present a late aggregation approach that allows for aggregating pairwise costs throughout the network feed-forward process, achieving accurate estimations with only minor changes of the plain CasMVSNet.Instead of building an intermediate cost by weighted sum, late aggregation preserves all pairwise costs along a distinct view channel. This enables the succeeding depth network to fully utilize the crucial geometric cues without loss of cost fidelity. Grounded in the new aggregation scheme, we propose further techniques addressing view order dependence inside the preserved cost, handling flexible testing views, and improving the depth filtering process. Despite its technical simplicity, our method improves significantly upon the baseline cascade-based approach, achieving comparable results with state-of-the-art methods with favorable computation overhead.
翻译:配对匹配成本聚合是现代基于学习的多视角立体(MVS)中的关键步骤。现有工作采用早期聚合方案,将配对成本累加为中间成本。然而,我们分析发现该过程可能削弱有信息的配对匹配,从而阻碍深度网络充分利用原始几何匹配线索。为解决此问题,我们提出一种后聚合方法,允许在网络前馈过程中聚合配对成本,仅需对普通CasMVSNet进行微小改动即可实现精确估计。后聚合不通过加权求和构建中间成本,而是沿独立视图通道保留所有配对成本。这使得后续深度网络能够充分利用关键几何线索,且不损失成本保真度。基于新聚合方案,我们进一步提出处理保留成本中视图顺序依赖、灵活处理测试视图及改进深度过滤过程的技术。尽管方法技术简单,但其在基线级联方法基础上显著提升,以可观的计算开销实现了与最先进方法相媲美的结果。