融合事件与帧图像的分层特征精炼网络用于目标检测 (Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection)

In frame-based vision, object detection faces substantial performance degradation under challenging conditions due to the limited sensing capability of conventional cameras. Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems. However, effectively fusing two heterogeneous modalities remains an open issue. In this work, we propose a novel hierarchical feature refinement network for event-frame fusion. The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module. In the initial phase, the bidirectional cross-modality interaction (BCI) part facilitates information bridging from two distinct sources. Subsequently, the features are further refined by aligning the channel-level mean and variance in the two-fold adaptive feature refinement (TAFR) part. We conducted extensive experiments on two benchmarks: the low-resolution PKU-DDD17-Car dataset and the high-resolution DSEC dataset. Experimental results show that our method surpasses the state-of-the-art by an impressive margin of $\textbf{8.0}\%$ on the DSEC dataset. Besides, our method exhibits significantly better robustness (\textbf{69.5}\% versus \textbf{38.7}\%) when introducing 15 different corruption types to the frame images. The code can be found at the link (https://github.com/HuCaoFighting/FRN).

翻译：在基于帧图像的视觉系统中，由于传统相机感知能力有限，目标检测在挑战性条件下面临显著的性能下降。事件相机输出稀疏且异步的事件，为解决这些问题提供了潜在方案。然而，如何有效融合这两种异构模态仍是一个开放性问题。在本工作中，我们提出了一种新颖的用于事件-帧融合的分层特征精炼网络。其核心设计是粗到细的融合模块，称为跨模态自适应特征精炼（CAFR）模块。在初始阶段，双向跨模态交互（BCI）部分促进来自两个不同源的信息桥接。随后，通过在两阶段自适应特征精炼（TAFR）部分对齐通道级均值和方差，特征得到进一步精炼。我们在两个基准数据集上进行了广泛实验：低分辨率PKU-DDD17-Car数据集和高分辨率DSEC数据集。实验结果表明，我们的方法在DSEC数据集上以显著优势超越了现有最佳方法，性能提升达$\textbf{8.0}\%$。此外，当对帧图像引入15种不同类型的损坏时，我们的方法展现出明显更好的鲁棒性（\textbf{69.5}\%对比\textbf{38.7}\%）。代码可通过链接（https://github.com/HuCaoFighting/FRN）获取。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日