Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets.
翻译:现有方法(如概念瓶颈模型CBM)在为黑箱深度学习模型提供基于概念的解释方面取得了成功。这类方法通常先根据输入预测概念,再根据预测的概念预测最终类别标签。然而,(1)它们往往难以捕捉概念之间的高阶非线性交互作用。例如,修正一个预测概念(如“黄色胸部”)无法帮助修正高度相关的概念(如“黄色腹部”),导致最终准确率欠佳;(2)它们无法自然地量化不同概念与类别标签之间的复杂条件依赖关系(例如,对于一张类别标签为“肯塔基莺”、概念为“黑色鸟喙”的图像,模型正确预测另一概念“黑色冠羽”的概率是多少),因此无法深入揭示黑箱模型的工作原理。针对这些局限,我们提出基于能量的概念瓶颈模型(ECBM)。我们的ECBM使用一组神经网络来定义候选(输入,概念,类别)元组的联合能量。通过这种统一接口,预测、概念修正和条件依赖量化被表示为通过组合不同能量函数生成的条件概率。我们的ECBM解决了现有CBM的两大局限,提供了更高的准确率和更丰富的概念解释。实验结果表明,我们的方法在真实世界数据集上优于当前最先进水平。