基于事件的自适应深度估计模型在恶劣成像条件下的应用 (Adapting Depth Anything to Adverse Imaging Conditions with Events)

Robust depth estimation under dynamic and adverse lighting conditions is essential for robotic systems. Currently, depth foundation models, such as Depth Anything, achieve great success in ideal scenes but remain challenging under adverse imaging conditions such as extreme illumination and motion blur. These degradations corrupt the visual signals of frame cameras, weakening the discriminative features of frame-based depths across the spatial and temporal dimensions. Typically, existing approaches incorporate event cameras to leverage their high dynamic range and temporal resolution, aiming to compensate for corrupted frame features. However, such specialized fusion models are predominantly trained from scratch on domain-specific datasets, thereby failing to inherit the open-world knowledge and robust generalization inherent to foundation models. In this work, we propose ADAE, an event-guided spatiotemporal fusion framework for Depth Anything in degraded scenes. Our design is guided by two key insights: 1) Entropy-Aware Spatial Fusion. We adaptively merge frame-based and event-based features using an information entropy strategy to indicate illumination-induced degradation. 2) Motion-Guided Temporal Correction. We resort to the event-based motion cue to recalibrate ambiguous features in blurred regions. Under our unified framework, the two components are complementary to each other and jointly enhance Depth Anything under adverse imaging conditions. Extensive experiments have been performed to verify the superiority of the proposed method. Our code will be released upon acceptance.

翻译：在动态和恶劣光照条件下实现鲁棒的深度估计对于机器人系统至关重要。当前，深度基础模型（如Depth Anything）在理想场景中取得了巨大成功，但在极端光照和运动模糊等恶劣成像条件下仍面临挑战。这些退化现象会破坏帧相机的视觉信号，削弱基于帧的深度特征在空间和时间维度上的判别性。通常，现有方法通过引入事件相机来利用其高动态范围和时间分辨率，旨在补偿受损的帧特征。然而，此类专用融合模型主要在领域特定数据集上从头训练，因此未能继承基础模型固有的开放世界知识和鲁棒泛化能力。在本研究中，我们提出了ADAE——一种针对退化场景下Depth Anything的事件引导时空融合框架。我们的设计基于两个关键见解：1）熵感知空间融合。我们采用信息熵策略自适应地融合基于帧和基于事件的特征，以指示光照引起的退化。2）运动引导时间校正。我们利用基于事件的运动线索重新校准模糊区域中的模糊特征。在我们统一的框架下，这两个组件相互补充，共同增强Depth Anything在恶劣成像条件下的性能。大量实验验证了所提方法的优越性。我们的代码将在论文被接受后开源。