Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

This paper addresses the challenge of cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors e.g., DE-ViT~\cite{zhang2023detect} have excelled in both open-vocabulary object detection and traditional few-shot object detection, detecting categories beyond those seen during training, we thus naturally raise two key questions: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If no, how to enhance the results of open-set methods when faced with significant domain gaps? To address the first question, we introduce several metrics to quantify domain variances and establish a new CD-FSOD benchmark with diverse domain metric values. Some State-Of-The-Art (SOTA) open-set object detection methods are evaluated on this benchmark, with evident performance degradation observed across out-of-domain datasets. This indicates the failure of adopting open-set detectors directly for CD-FSOD. Sequentially, to overcome the performance degradation issue and also to answer the second proposed question, we endeavor to enhance the vanilla DE-ViT. With several novel components including finetuning, a learnable prototype module, and a lightweight attention module, we present an improved Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO). Experiments show that our CD-ViTO achieves impressive results on both out-of-domain and in-domain target datasets, establishing new SOTAs for both CD-FSOD and FSOD. All the datasets, codes, and models will be released to the community.

翻译：本文针对跨域小样本目标检测（CD-FSOD）的挑战，旨在利用极少量标注样本为新颖域开发高精度目标检测器。尽管基于Transformer的开放集检测器（如DE-ViT~\cite{zhang2023detect}）在开放词汇目标检测和传统小样本目标检测中均表现出色，能检测训练中未见的类别，我们因此自然提出两个关键问题：1）此类开放集检测方法能否直接泛化至CD-FSOD？2）若不能，当面临显著域差异时，如何增强开放集方法的检测效果？针对第一个问题，我们引入多项指标量化域差异，并构建包含多样化域度量值的CD-FSOD新基准。在该基准上评估了多种最先进（SOTA）开放集目标检测方法，发现其在跨域数据集上存在显著性能衰减，表明直接将开放集检测器用于CD-FSOD存在局限性。为克服性能衰减并回应第二个问题，我们致力于增强原生DE-ViT模型。通过引入包括微调、可学习原型模块和轻量级注意力模块在内的多项创新组件，我们提出改进型跨域视觉Transformer用于CD-FSOD（CD-ViTO）。实验表明，CD-ViTO在跨域和同域目标数据集上均取得优异结果，在CD-FSOD和FSOD任务上均创下新SOTA。所有数据集、代码和模型将向社区公开。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日