Out-of-distribution (OOD) detection is essential for reliable and trustworthy machine learning. Recent multi-modal OOD detection leverages textual information from in-distribution (ID) class names for visual OOD detection, yet it currently neglects the rich contextual information of ID classes. Large language models (LLMs) encode a wealth of world knowledge and can be prompted to generate descriptive features for each class. Indiscriminately using such knowledge causes catastrophic damage to OOD detection due to LLMs' hallucinations, as is observed by our analysis. In this paper, we propose to apply world knowledge to enhance OOD detection performance through selective generation from LLMs. Specifically, we introduce a consistency-based uncertainty calibration method to estimate the confidence score of each generation. We further extract visual objects from each image to fully capitalize on the aforementioned world knowledge. Extensive experiments demonstrate that our method consistently outperforms the state-of-the-art.
翻译:分布外(OOD)检测对于实现可靠且可信的机器学习至关重要。近年来,多模态OOD检测利用分布内(ID)类别的文本信息辅助视觉OOD检测,然而现有方法忽视了ID类别丰富的上下文信息。大语言模型(LLMs)编码了广泛的世界知识,可通过提示生成每个类别的描述性特征。但我们的分析表明,由于LLMs存在的幻觉问题,不加区分地使用这类知识会对OOD检测造成灾难性影响。本文提出通过选择性生成来自LLMs的世界知识来提升OOD检测性能。具体而言,我们引入一种基于一致性校准的不确定性估计方法,用于评估每次生成的置信度得分。进一步地,我们从每张图像中提取视觉对象以充分利用前述世界知识。大量实验表明,我们的方法持续优于现有最先进技术。