Event-guided motion deblurring reconstructs sharp images using the high-temporal-resolution motion cues from event cameras. However, in real capture, thresholding-induced event under-reporting causes missing and fragmented motion cues, under which existing methods often degrade in performance due to two limitations: i) assumptions of dense and stable events, and ii) modality-indiscriminate extraction and fusion that fail to separate useful motion cues from disrupted events, allowing them to contaminate cross-modal representations. In this paper, we first introduce a Robustness-Oriented Perturbation Strategy (RPS) that mimics various trigger thresholds of dynamic vision sensors, exposing our model to diverse under-reporting patterns and thereby improving robustness under unknown conditions. Built upon this setting, we propose RED, a Robust Event-guided Deblurring network, following the principle of disentangle first and then fuse selectively. Specifically, the Modality-specific Representation Mechanism disentangles the inputs into image-semantic, event-motion, and cross-modal representations, capturing appearance, motion, and complementary interactions, respectively. With the reliable disentangled features, we selectively fuse modalities to enhance motion-sensitive areas in blurry images and enrich under-reported events with semantic context. Extensive experiments on synthetic and real-world datasets demonstrate RED consistently achieves state-of-the-art performance in terms of both accuracy and robustness.
翻译:事件引导的运动去模糊利用事件相机的高时间分辨率运动线索重建清晰图像。然而,在实际采集过程中,由阈值化引发的事件漏报会导致运动线索缺失与碎片化。在此情况下,现有方法常因以下两个局限而导致性能下降:i) 假设事件密集且稳定;ii) 采用模态不加区分的提取与融合策略,未能将有效的运动线索从受损事件中分离,导致其污染跨模态表征。本文首先提出一种面向鲁棒性的扰动策略,该策略模拟动态视觉传感器的多种触发阈值,使模型暴露于多样化的漏报模式,从而提升其在未知条件下的鲁棒性。基于此设置,我们提出RED网络,遵循“先解耦,后选择性融合”的原则。具体而言,模态特定表征机制将输入解耦为图像语义表征、事件运动表征与跨模态表征,分别捕获外观、运动及互补交互信息。利用可靠解耦的特征,我们选择性融合模态以增强模糊图像中的运动敏感区域,并利用语义上下文丰富漏报事件。在合成与真实数据集上进行的大量实验表明,RED在精度与鲁棒性方面均持续取得最先进的性能。