Estimating a dense depth map from a single view is geometrically ill-posed, and state-of-the-art methods rely on learning depth's relation with visual appearance using deep neural networks. On the other hand, Structure from Motion (SfM) leverages multi-view constraints to produce very accurate but sparse maps, as matching across images is typically limited by locally discriminative texture. In this work, we combine the strengths of both approaches by proposing a novel test-time refinement (TTR) method, denoted as SfM-TTR, that boosts the performance of single-view depth networks at test time using SfM multi-view cues. Specifically, and differently from the state of the art, we use sparse SfM point clouds as test-time self-supervisory signal, fine-tuning the network encoder to learn a better representation of the test scene. Our results show how the addition of SfM-TTR to several state-of-the-art self-supervised and supervised networks improves significantly their performance, outperforming previous TTR baselines mainly based on photometric multi-view consistency. The code is available at https://github.com/serizba/SfM-TTR.
翻译:摘要:从单一视角估计稠密深度图在几何上是不适定的,现有主流方法依赖深度神经网络学习深度与视觉外观之间的关系。另一方面,运动恢复结构(SfM)利用多视角约束生成非常精确但稀疏的深度图,因为图像间匹配通常受限于局部判别性纹理。本文融合了两类方法的优势,提出一种新颖的测试时精化(TTR)方法——SfM-TTR,在测试阶段利用SfM多视角线索提升单视角深度网络的性能。具体而言,与现有技术不同,我们将稀疏SfM点云作为测试时自监督信号,对网络编码器进行微调以学习测试场景的更优表征。实验结果表明,将SfM-TTR应用于多种先进自监督与监督网络可显著提升其性能,且优于主要基于光度多视角一致性的先前TTR基线方法。代码已开源:https://github.com/serizba/SfM-TTR。