Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods.
翻译:传统图像帧中的关键点检测与跟踪常因运动模糊和极端光照条件等图像质量问题而受限。事件相机凭借其高时间分辨率和高动态范围特性,为应对这些挑战提供了潜在解决方案。然而,由于事件数据本身存在噪声,该类相机在实际应用中的性能有限。本文提倡融合图像帧与事件流的互补信息,以实现更鲁棒的关键点检测与跟踪。具体而言,我们提出了一种新颖的关键点检测网络FE-DeTr,该网络将图像帧中的纹理与结构信息,同事件流中的高时间分辨率运动信息进行融合。该网络利用时间响应一致性进行监督训练,确保稳定高效的关键点检测。此外,我们采用时空最近邻搜索策略实现鲁棒的关键点跟踪。我们在一个包含极端条件下采集的图像帧与事件数据的新数据集上进行了大量实验。实验结果证实,该方法在性能上优于现有的纯帧基与纯事件基方法。