OOVDet: Low-Density Prior Learning for Zero-Shot Out-of-Vocabulary Object Detection

Zero-shot out-of-vocabulary detection (ZS-OOVD) aims to accurately recognize objects of in-vocabulary (IV) categories provided at zero-shot inference, while simultaneously rejecting undefined ones (out-of-vocabulary, OOV) that lack corresponding category prompts. However, previous methods are prone to overfitting the IV classes, leading to the OOV or undefined classes being misclassified as IV ones with a high confidence score. To address this issue, this paper proposes a zero-shot OOV detector (OOVDet), a novel framework that effectively detects predefined classes while reliably rejecting undefined ones in zero-shot scenes. Specifically, due to the model's lack of prior knowledge about the distribution of OOV data, we synthesize region-level OOV prompts by sampling from the low-likelihood regions of the class-conditional Gaussian distributions in the hidden space, motivated by the assumption that unknown semantics are more likely to emerge in low-density areas of the latent space. For OOV images, we further propose a Dirichlet-based gradient attribution mechanism to mine pseudo-OOV image samples, where the attribution gradients are interpreted as Dirichlet evidence to estimate prediction uncertainty, and samples with high uncertainty are selected as pseudo-OOV images. Building on these synthesized OOV prompts and pseudo-OOV images, we construct the OOV decision boundary through a low-density prior constraint, which regularizes the optimization of OOV classes using Gaussian kernel density estimation in accordance with the above assumption. Experimental results show that our method significantly improves the OOV detection performance in zero-shot scenes. The code is available at https://github.com/binyisu/OOV-detector.

翻译：零样本词汇外检测（ZS-OOVD）旨在准确识别零样本推理时提供的词汇内（IV）类别物体，同时拒绝那些缺乏对应类别提示的未定义（词汇外，OOV）物体。然而，现有方法容易对IV类别过拟合，导致OOV或未定义类别被高置信度地误判为IV类别。为解决此问题，本文提出一种零样本OOV检测器（OOVDet），该新颖框架能在零样本场景中有效检测预定义类别，并可靠地拒绝未定义类别。具体而言，由于模型缺乏关于OOV数据分布的先验知识，我们基于“未知语义更可能出现在隐空间低密度区域”的假设，通过从隐藏空间中类条件高斯分布的低似然区域采样，合成区域级OOV提示。对于OOV图像，我们进一步提出一种基于狄利克雷分布的梯度归因机制来挖掘伪OOV图像样本：将归因梯度解释为狄利克雷证据以估计预测不确定性，并选择高不确定性样本作为伪OOV图像。基于这些合成的OOV提示与伪OOV图像，我们通过低密度先验约束构建OOV决策边界，该约束依据上述假设，采用高斯核密度估计对OOV类别的优化进行正则化。实验结果表明，我们的方法显著提升了零样本场景下的OOV检测性能。代码发布于 https://github.com/binyisu/OOV-detector。