SODAWideNet -- Salient Object Detection with an Attention augmented Wide Encoder Decoder network without ImageNet pre-training

Developing a new Salient Object Detection (SOD) model involves selecting an ImageNet pre-trained backbone and creating novel feature refinement modules to use backbone features. However, adding new components to a pre-trained backbone needs retraining the whole network on the ImageNet dataset, which requires significant time. Hence, we explore developing a neural network from scratch directly trained on SOD without ImageNet pre-training. Such a formulation offers full autonomy to design task-specific components. To that end, we propose SODAWideNet, an encoder-decoder-style network for Salient Object Detection. We deviate from the commonly practiced paradigm of narrow and deep convolutional models to a wide and shallow architecture, resulting in a parameter-efficient deep neural network. To achieve a shallower network, we increase the receptive field from the beginning of the network using a combination of dilated convolutions and self-attention. Therefore, we propose Multi Receptive Field Feature Aggregation Module (MRFFAM) that efficiently obtains discriminative features from farther regions at higher resolutions using dilated convolutions. Next, we propose Multi-Scale Attention (MSA), which creates a feature pyramid and efficiently computes attention across multiple resolutions to extract global features from larger feature maps. Finally, we propose two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), that achieve competitive performance against state-of-the-art models on five datasets.

翻译：开发新的显著目标检测（SOD）模型通常需要选择基于ImageNet预训练的骨干网络，并设计新颖的特征细化模块来利用骨干网络特征。然而，为预训练的骨干网络添加新组件需在ImageNet数据集上重新训练整个网络，耗费大量时间。为此，我们探索直接从头训练SOD专用神经网络，无需ImageNet预训练。这种方案为设计任务特定组件提供了完全自主性。基于此，我们提出SODAWideNet——一种用于显著目标检测的编码器-解码器架构网络。我们摒弃了传统窄而深卷积模型的范式，转向宽而浅的架构，从而构建参数高效的深度神经网络。为实现更浅的网络，我们从网络起始阶段通过膨胀卷积与自注意力的组合扩大感受野。因此，我们提出多感受野特征聚合模块（MRFFAM），利用膨胀卷积高效地从更高分辨率下的远距离区域获取判别性特征。接着，我们提出多尺度注意力（MSA），构建特征金字塔并在多分辨率上高效计算注意力，以从较大特征图中提取全局特征。最终，我们提出两个变体：SODAWideNet-S（3.03M参数量）和SODAWideNet（9.03M参数量），在五个数据集上取得了与最先进模型相竞争的检测性能。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日