Out-of-distribution (OOD) detection is essential for ensuring the robustness of machine learning models by identifying samples that deviate from the training distribution. While traditional OOD detection has primarily focused on single-modality inputs, such as images, recent advances in multimodal models have demonstrated the potential of leveraging multiple modalities (e.g., video, optical flow, audio) to enhance detection performance. However, existing methods often overlook intra-class variability within in-distribution (ID) data, assuming that samples of the same class are perfectly cohesive and consistent. This assumption can lead to performance degradation, especially when prediction discrepancies are uniformly amplified across all samples. To address this issue, we propose Dynamic Prototype Updating (DPU), a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations. Our method dynamically updates class center representations for each class by measuring the variance of similar samples within each batch, enabling adaptive adjustments. This approach allows us to amplify prediction discrepancies based on the updated class centers, thereby improving the model's robustness and generalization across different modalities. Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection. To facilitate accessibility and reproducibility, our code is publicly available on GitHub.
翻译:分布外(OOD)检测通过识别偏离训练分布的样本,对于确保机器学习模型的鲁棒性至关重要。传统的OOD检测主要集中于单模态输入(如图像),而多模态模型的最新进展已展现出利用多种模态(如视频、光流、音频)提升检测性能的潜力。然而,现有方法往往忽视分布内(ID)数据的类内变异性,假设同一类别的样本完全内聚且一致。这一假设可能导致性能下降,尤其当预测差异在所有样本中被均匀放大时。为解决此问题,我们提出动态原型更新(DPU),一种新颖的即插即用多模态OOD检测框架,能够有效处理类内变化。该方法通过度量每个批次内相似样本的方差,动态更新每个类别的类中心表示,从而实现自适应调整。此策略允许我们基于更新后的类中心放大预测差异,进而提升模型在不同模态间的鲁棒性和泛化能力。在两个任务、五个数据集和九种基础OOD算法上的大量实验表明,DPU显著提升了OOD检测性能,在多模态OOD检测中创造了新的最优水平,其中远分布OOD检测性能提升高达80%。为促进可访问性与可复现性,我们的代码已在GitHub上公开。