Out-of-distribution (OOD) detection is crucial for the deployment of machine learning models in the open world. While existing OOD detectors are effective in identifying OOD samples that deviate significantly from in-distribution (ID) data, they often come with trade-offs. For instance, deep OOD detectors usually suffer from high computational costs, require tuning hyperparameters, and have limited interpretability, whereas traditional OOD detectors may have a low accuracy on large high-dimensional datasets. To address these limitations, we propose a novel effective OOD detection approach that employs an overlap index (OI)-based confidence score function to evaluate the likelihood of a given input belonging to the same distribution as the available ID samples. The proposed OI-based confidence score function is non-parametric, lightweight, and easy to interpret, hence providing strong flexibility and generality. Extensive empirical evaluations indicate that our OI-based OOD detector is competitive with state-of-the-art OOD detectors in terms of detection accuracy on a wide range of datasets while requiring less computation and memory costs. Lastly, we show that the proposed OI-based confidence score function inherits nice properties from OI (e.g., insensitivity to small distributional variations and robustness against Huber $\epsilon$-contamination) and is a versatile tool for estimating OI and model accuracy in specific contexts.
翻译:分布外(OOD)检测对于机器学习模型在开放世界中的部署至关重要。尽管现有的OOD检测器能有效识别与分布内(ID)数据显著偏离的OOD样本,但它们通常存在权衡。例如,深度OOD检测器通常计算成本高、需要调整超参数且可解释性有限,而传统OOD检测器在大型高维数据集上可能精度较低。为应对这些局限,我们提出了一种新颖有效的OOD检测方法,该方法采用基于重叠指数(OI)的置信度评分函数来评估给定输入与可用ID样本属于同一分布的可能性。所提出的基于OI的置信度评分函数具有非参数、轻量级和易于解释的特点,因此提供了强大的灵活性和通用性。大量实证评估表明,基于OI的OOD检测器在多种数据集上的检测精度与最先进的OOD检测器相当,同时所需计算和内存成本更低。最后,我们证明了所提出的基于OI的置信度评分函数继承了OI的优良特性(例如对小分布变化的不敏感性和对Huber $\epsilon$-污染的鲁棒性),并且是在特定场景下估计OI和模型精度的通用工具。