In the realm of video object segmentation (VOS), the challenge of operating under low-light conditions persists, resulting in notably degraded image quality and compromised accuracy when comparing query and memory frames for similarity computation. Event cameras, characterized by their high dynamic range and ability to capture motion information of objects, offer promise in enhancing object visibility and aiding VOS methods under such low-light conditions. This paper introduces a pioneering framework tailored for low-light VOS, leveraging event camera data to elevate segmentation accuracy. Our approach hinges on two pivotal components: the Adaptive Cross-Modal Fusion (ACMF) module, aimed at extracting pertinent features while fusing image and event modalities to mitigate noise interference, and the Event-Guided Memory Matching (EGMM) module, designed to rectify the issue of inaccurate matching prevalent in low-light settings. Additionally, we present the creation of a synthetic LLE-DAVIS dataset and the curation of a real-world LLE-VOS dataset, encompassing frames and events. Experimental evaluations corroborate the efficacy of our method across both datasets, affirming its effectiveness in low-light scenarios.
翻译:在视频目标分割(VOS)领域,低光照条件下的运行挑战依然存在,导致图像质量显著下降,并在计算查询帧与记忆帧的相似度时精度受损。事件相机凭借其高动态范围和捕捉物体运动信息的能力,有望在低光照条件下增强物体可见性并辅助VOS方法。本文引入了一个专为低光照VOS设计的开创性框架,利用事件相机数据提升分割精度。我们的方法依赖于两个关键组件:自适应跨模态融合(ACMF)模块,旨在提取相关特征并融合图像与事件模态以减轻噪声干扰;以及事件引导的记忆匹配(EGMM)模块,旨在纠正低光照环境下普遍存在的不准确匹配问题。此外,我们提出创建了合成LLE-DAVIS数据集,并整理了包含帧和事件的真实世界LLE-VOS数据集。实验评估验证了我们的方法在两个数据集上的有效性,确认了其在低光照场景下的优越性能。