SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection

Synthetic Aperture Radar (SAR) object detection has gained significant attention recently due to its irreplaceable all-weather imaging capabilities. However, this research field suffers from both limited public datasets (mostly comprising <2K images with only mono-category objects) and inaccessible source code. To tackle these challenges, we establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets, providing a large-scale and diverse dataset for research purposes. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created. With this high-quality dataset, we conducted comprehensive experiments and uncovered a crucial challenge in SAR object detection: the substantial disparities between the pretraining on RGB datasets and finetuning on SAR datasets in terms of both data domain and model structure. To bridge these gaps, we propose a novel Multi-Stage with Filter Augmentation (MSFA) pretraining framework that tackles the problems from the perspective of data input, domain transition, and model migration. The proposed MSFA method significantly enhances the performance of SAR object detection models while demonstrating exceptional generalizability and flexibility across diverse models. This work aims to pave the way for further advancements in SAR object detection. The dataset and code is available at https://github.com/zcablii/SARDet_100K.

翻译：合成孔径雷达（SAR）目标检测因其不可替代的全天候成像能力，近年来受到广泛关注。然而，该研究领域存在公共数据集规模有限（大多包含少于2000张图像且仅含单类别目标）以及源代码不可获取的问题。为应对这些挑战，我们构建了新的基准数据集和大规模SAR目标检测开源方法。我们的数据集SARDet-100K通过对10个现有SAR检测数据集进行深度调研、收集与标准化处理，为研究提供了大规模、多样化的数据资源。据我们所知，SARDet-100K是首个达到COCO级别的大规模多类别SAR目标检测数据集。基于该高质量数据集，我们进行了全面实验，揭示了SAR目标检测的关键挑战：在数据域和模型结构两方面，RGB数据集预训练方法与SAR数据集微调方法之间存在显著差异。为弥合这些差距，我们提出了一种新颖的多阶段滤波增强（MSFA）预训练框架，从数据输入、域迁移和模型迁移三个维度解决问题。所提出的MSFA方法在显著提升SAR目标检测模型性能的同时，展现了卓越的泛化能力和模型适配灵活性。本研究旨在为SAR目标检测的进一步发展铺平道路。数据集和代码已开源至https://github.com/zcablii/SARDet_100K。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日