Depth estimation under adverse conditions remains a significant challenge. Recently, multi-spectral depth estimation, which integrates both visible light and thermal images, has shown promise in addressing this issue. However, existing algorithms struggle with precise pixel-level feature matching, limiting their ability to fully exploit geometric constraints across different spectra. To address this, we propose a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints. In particular, we treat the visible light and thermal images as a stereo pair and utilize a Cross-modal Feature Matching (CFM) Module to construct a cost volume for pixel-level matching. To mitigate the effects of poor lighting on stereo matching, we introduce Degradation Masking, which leverages robust monocular thermal depth estimation in degraded regions. Our method achieves state-of-the-art (SOTA) performance on the Multi-Spectral Stereo (MS2) dataset, with qualitative evaluations demonstrating high-quality depth maps under varying lighting conditions.
翻译:恶劣条件下的深度估计仍然是一个重大挑战。近期,融合可见光与热成像的多光谱深度估计方法在解决此问题上展现出潜力。然而,现有算法在精确的像素级特征匹配方面存在困难,限制了其充分利用跨光谱几何约束的能力。为此,我们提出了一种结合立体深度估计的新颖框架,以施加精确的几何约束。具体而言,我们将可见光与热成像图像视为立体对,并利用跨模态特征匹配模块构建用于像素级匹配的代价体。为减轻光照不良对立体匹配的影响,我们引入了退化掩蔽技术,该技术在退化区域利用鲁棒的单目热成像深度估计。我们的方法在MS2数据集上取得了最先进的性能,定性评估表明其在多变光照条件下能生成高质量的深度图。