Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language Models

Out-of-distribution (OOD) detection aims to identify samples that deviate from in-distribution (ID). One popular pipeline addresses this by introducing negative labels distant from ID classes and detecting OOD based on their distance to these labels. However, such labels may present poor activation on OOD samples, failing to capture the OOD characteristics. To address this, we propose \underline{T}est-time \underline{A}ctivated \underline{N}egative \underline{L}abels (TANL) by dynamically evaluating activation levels across the corpus dataset and mining candidate labels with high activation responses during the testing process. Specifically, TANL identifies high-confidence test images online and accumulates their assignment probabilities over the corpus to construct a label activation metric. Such a metric leverages historical test samples to adaptively align with the test distribution, enabling the selection of distribution-adaptive activated negative labels. By further exploring the activation information within the current testing batch, we introduce a more fine-grained, batch-adaptive variant. To fully utilize label activation knowledge, we propose an activation-aware score function that emphasizes negative labels with stronger activations, boosting performance and enhancing its robustness to the label number. Our TANL is training-free, test-efficient, and grounded in theoretical justification. Experiments on diverse backbones and wide task settings validate its effectiveness. Notably, on the large-scale ImageNet benchmark, TANL significantly reduces the FPR95 from 17.5\% to 9.8\%. Codes are available at \href{https://github.com/YBZh/OpenOOD-VLM}{YBZh/OpenOOD-VLM}.

翻译：分布外（OOD）检测旨在识别偏离分布内（ID）的样本。一种常见方法通过引入远离ID类别的负标签，并根据样本与这些标签的距离检测OOD。然而，此类标签可能在OOD样本上激活不足，无法捕捉OOD特征。为解决此问题，我们提出测试时激活负标签（TANL），通过在测试过程中动态评估整个语料库的激活水平，挖掘高激活响应的候选标签。具体而言，TANL在线识别高置信度测试图像，并累加其在语料库上的分配概率，构建标签激活度量。该度量利用历史测试样本自适应对齐测试分布，从而选择分布自适应的激活负标签。通过进一步挖掘当前测试批次内的激活信息，我们引入更细粒度的批次自适应变体。为充分利用标签激活知识，我们提出一种激活感知得分函数，强调激活强度更高的负标签，从而提升性能并增强对标签数量的鲁棒性。TANL无需训练、测试高效且具有理论依据。在多种骨干网络和广泛任务设置上的实验验证了其有效性。值得注意的是，在大规模ImageNet基准上，TANL将FPR95从17.5%显著降低至9.8%。代码开源于\href{https://github.com/YBZh/OpenOOD-VLM}{YBZh/OpenOOD-VLM}。