Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets.
翻译:现有方法,例如概念瓶颈模型(CBMs),在为黑盒深度学习模型提供基于概念的解释方面已经取得了成功。它们通常通过根据输入预测概念,然后根据预测的概念预测最终类别标签来工作。然而,(1)它们通常无法捕捉概念之间的高阶非线性交互作用,例如,纠正一个预测的概念(如“黄色胸部”)并不能帮助纠正高度相关的概念(如“黄色腹部”),导致最终准确性欠佳;(2)它们无法自然地量化不同概念与类别标签之间复杂的条件依赖关系(例如,对于一张类别标签为“肯塔基莺”、概念为“黑色喙”的图像,模型正确预测另一个概念“黑色冠冕”的概率是多少),因此无法提供对黑盒模型工作原理的更深入洞察。针对这些局限性,我们提出了基于能量的概念瓶颈模型(ECBMs)。我们的ECBMs使用一组神经网络来定义候选(输入,概念,类别)元组的联合能量。通过这种统一的接口,预测、概念修正和条件依赖量化被表示为条件概率,这些条件概率由不同的能量函数组合生成。我们的ECBMs解决了现有CBMs的这两个局限性,提供了更高的准确性和更丰富的概念解释。实证结果表明,我们的方法在真实世界数据集上优于现有最先进技术。