基于FLAME的即时开放词汇目标检测自适应：通过主动边缘样本探索实现少样本定位 (On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration)

Open-vocabulary object detection (OVD) models offer remarkable flexibility by detecting objects from arbitrary text queries. However, their zero-shot performance in specialized domains like Remote Sensing (RS) is often compromised by the inherent ambiguity of natural language, limiting critical downstream applications. For instance, an OVD model may struggle to distinguish between fine-grained classes such as "fishing boat" and "yacht" since their embeddings are similar and often inseparable. This can hamper specific user goals, such as monitoring illegal fishing, by producing irrelevant detections. To address this, we propose a cascaded approach that couples the broad generalization of a large pre-trained OVD model with a lightweight few-shot classifier. Our method first employs the zero-shot model to generate high-recall object proposals. These proposals are then refined for high precision by a compact classifier trained in real-time on only a handful of user-annotated examples - drastically reducing the high costs of RS imagery annotation.The core of our framework is FLAME, a one-step active learning strategy that selects the most informative samples for training. FLAME identifies, on the fly, uncertain marginal candidates near the decision boundary using density estimation, followed by clustering to ensure sample diversity. This efficient sampling technique achieves high accuracy without costly full-model fine-tuning and enables instant adaptation, within less then a minute, which is significantly faster than state-of-the-art alternatives.Our method consistently surpasses state-of-the-art performance on RS benchmarks, establishing a practical and resource-efficient framework for adapting foundation models to specific user needs.

翻译：开放词汇目标检测（OVD）模型通过检测任意文本查询中的目标，展现出卓越的灵活性。然而，在遥感（RS）等专业领域中，其零样本性能常因自然语言固有的歧义性而受限，影响了关键下游应用。例如，OVD模型可能难以区分“渔船”与“游艇”等细粒度类别，因为它们的嵌入表示相似且往往难以分离。这可能导致产生无关检测结果，从而阻碍特定用户目标（如监测非法捕鱼）的实现。为解决此问题，我们提出一种级联方法，将大规模预训练OVD模型的广泛泛化能力与轻量级少样本分类器相结合。我们的方法首先利用零样本模型生成高召回率的目标候选框，随后通过一个在少量用户标注样本上实时训练的紧凑分类器对这些候选框进行高精度优化——这大幅降低了遥感图像标注的高昂成本。我们框架的核心是FLAME，一种一步式主动学习策略，用于选择最具信息量的训练样本。FLAME通过密度估计即时识别决策边界附近的不确定边缘候选样本，随后进行聚类以确保样本多样性。这种高效采样技术无需昂贵的全模型微调即可实现高精度，并支持在不到一分钟内完成即时自适应，速度显著优于当前最先进方案。我们的方法在遥感基准测试中持续超越最先进性能，为将基础模型适配至特定用户需求建立了一个实用且资源高效的框架。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日