In self-supervised monocular depth estimation tasks, discrete disparity prediction has been proven to attain higher quality depth maps than common continuous methods. However, current discretization strategies often divide depth ranges of scenes into bins in a handcrafted and rigid manner, limiting model performance. In this paper, we propose a learnable module, Adaptive Discrete Disparity Volume (ADDV), which is capable of dynamically sensing depth distributions in different RGB images and generating adaptive bins for them. Without any extra supervision, this module can be integrated into existing CNN architectures, allowing networks to produce representative values for bins and a probability volume over them. Furthermore, we introduce novel training strategies - uniformizing and sharpening - through a loss term and temperature parameter, respectively, to provide regularizations under self-supervised conditions, preventing model degradation or collapse. Empirical results demonstrate that ADDV effectively processes global information, generating appropriate bins for various scenes and producing higher quality depth maps compared to handcrafted methods.
翻译:在自监督单目深度估计任务中,离散视差预测方法已被证明能够获得比常见连续方法更高质量的深度图。然而,当前的离散化策略通常以人工设定且固定的方式将场景深度范围划分为区间,限制了模型性能。本文提出一种可学习模块——自适应离散视差体(ADDV),该模块能够动态感知不同RGB图像中的深度分布,并为其生成自适应区间。在无需额外监督的条件下,该模块可集成到现有CNN架构中,使网络能够生成区间的代表值及其对应的概率体。此外,我们通过损失项和温度参数分别引入了新颖的训练策略——均匀化与锐化处理,以在自监督条件下提供正则化约束,防止模型性能退化或崩溃。实验结果表明,ADDV能有效处理全局信息,为不同场景生成合适的区间,相比人工设定方法可产生更高质量的深度图。