RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection

The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy for the limited application scenarios of traditional RGB camera. The RGB-X tasks, which rely on RGB input and another type of data input to resolve specific problems, have become a popular research topic in multimedia. A crucial part in two-branch RGB-X deep neural networks is how to fuse information across modalities. Given the tremendous information inside RGB-X networks, previous works typically apply naive fusion (e.g., average or max fusion) or only focus on the feature fusion at the same scale(s). While in this paper, we propose a novel method called RXFOOD for the fusion of features across different scales within the same modality branch and from different modality branches simultaneously in a unified attention mechanism. An Energy Exchange Module is designed for the interaction of each feature map's energy matrix, who reflects the inter-relationship of different positions and different channels inside a feature map. The RXFOOD method can be easily incorporated to any dual-branch encoder-decoder network as a plug-in module, and help the original backbone network better focus on important positions and channels for object of interest detection. Experimental results on RGB-NIR salient object detection, RGB-D salient object detection, and RGBFrequency image manipulation detection demonstrate the clear effectiveness of the proposed RXFOOD.

翻译：不同传感器（近红外、深度等）的出现弥补了传统RGB相机应用场景有限的缺陷。依赖RGB输入及另一类数据输入解决特定问题的RGB-X任务，已成为多媒体领域的热门研究课题。在双分支RGB-X深度神经网络中，如何实现跨模态信息融合是核心环节。鉴于RGB-X网络蕴含海量信息，现有工作通常采用朴素融合（如平均或最大融合），或仅关注同尺度下的特征融合。本文提出名为RXFOOD的创新方法，通过统一注意力机制同时实现同一模态分支内跨尺度特征融合与不同模态分支间的特征融合。我们设计了能量交换模块，用于实现各特征图能量矩阵的交互——该矩阵反映了特征图内不同位置与不同通道间的内在关联。RXFOOD方法可作为即插即用模块轻松嵌入任意双分支编码器-解码器网络，帮助原始骨干网络更好地聚焦于感兴趣目标检测的关键位置与通道。在RGB-NIR显著性目标检测、RGB-D显著性目标检测及RGB-频率图像篡改检测任务上的实验结果表明，所提出的RXFOOD方法具有显著有效性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日