Out-of-distribution (OOD) detection plays a crucial role in ensuring the safe deployment of deep neural network (DNN) classifiers. While a myriad of methods have focused on improving the performance of OOD detectors, a critical gap remains in interpreting their decisions. We help bridge this gap by providing explanations for OOD detectors based on learned high-level concepts. We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector's decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors. We also show how to identify prominent concepts contributing to the detection results, and provide further reasoning about their decisions.
翻译:分布外检测在确保深度神经网络分类器安全部署中扮演着关键角色。尽管已有大量方法致力于提升分布外检测器的性能,但在解释其决策机制方面仍存在显著空白。我们通过基于学习到的高层概念为分布外检测器提供解释,助力弥合这一鸿沟。本文首先提出两个评估特定概念集对解释分布外检测器效用的新指标:1)检测完备性,量化概念足以解释分布外检测器决策的程度;2)概念可分性,刻画分布内与分布外数据在概念空间中的分布分离程度。基于这些指标,我们提出一个无监督框架以学习具备高检测完备性与概念可分性的概念集,并展示其在为多种现成分布外检测器提供基于概念的解释时的有效性。此外,我们进一步揭示如何识别主导检测结果的关键概念,并对其决策逻辑进行深层推理。