Foundation object detectors such as GLIP and Grounding DINO excel on general-domain data but often degrade in specialized and data-scarce settings like underwater imagery or industrial defects. Typical cross-domain few-shot approaches rely on fine-tuning scarce target data, incurring cost and overfitting risks. We instead ask: Can a frozen detector adapt with only one exemplar per class without training? To answer this, we introduce training-free one-shot domain generalization for object detection, where detectors must adapt to specialized domains with only one annotated exemplar per class and no weight updates. To tackle this task, we propose LAB-Det, which exploits Language As a domain-invariant Bridge. Instead of adapting visual features, we project each exemplar into a descriptive text that conditions and guides a frozen detector. This linguistic conditioning replaces gradient-based adaptation, enabling robust generalization in data-scarce domains. We evaluate on UODD (underwater) and NEU-DET (industrial defects), two widely adopted benchmarks for data-scarce detection, where object boundaries are often ambiguous, and LAB-Det achieves up to 5.4 mAP improvement over state-of-the-art fine-tuned baselines without updating a single parameter. These results establish linguistic adaptation as an efficient and interpretable alternative to fine-tuning in specialized detection settings.
翻译:基础目标检测器(如GLIP和Grounding DINO)在通用领域数据上表现出色,但在水下图像或工业缺陷等专业且数据稀缺的场景中性能常会下降。典型的跨领域少样本方法依赖于对稀缺目标数据进行微调,这会产生成本并带来过拟合风险。我们转而提出:一个冻结的检测器能否仅凭每个类别一个示例样本,无需训练即可完成适应?为回答此问题,我们引入了目标检测中的免训练单样本领域泛化任务,要求检测器仅使用每个类别一个带标注的示例样本,且不更新权重,即适应专业领域。针对此任务,我们提出LAB-Det,其核心是利用语言作为领域不变的桥梁。我们并非调整视觉特征,而是将每个示例样本投影为描述性文本,用于条件化并引导冻结的检测器。这种语言条件化替代了基于梯度的适应方法,从而能够在数据稀缺的领域中实现鲁棒的泛化。我们在两个广泛采用的数据稀缺检测基准数据集UODD(水下)和NEU-DET(工业缺陷)上进行了评估,这些场景中目标边界通常模糊不清。实验结果表明,LAB-Det在不更新任何参数的情况下,相比最先进的微调基线方法取得了高达5.4 mAP的性能提升。这些结果确立了语言适应作为一种高效且可解释的替代方案,在专业检测场景中可替代微调方法。