Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets.
翻译:现有方法,如概念瓶颈模型(CBMs),已成功为黑盒深度学习模型提供基于概念的解释。它们通常通过给定输入预测概念,然后基于预测的概念预测最终类别标签。然而,(1)这些方法往往无法捕捉概念间的高阶非线性交互,例如纠正一个预测概念(如“黄色胸脯”)无助于纠正高度相关的概念(如“黄色腹部”),导致最终准确率欠佳;(2)它们无法自然量化不同概念与类别标签之间的复杂条件依赖关系(例如,对于类别标签为“肯塔基莺”且概念为“黑色喙”的图像,模型正确预测另一概念“黑色冠羽”的概率是多少),因此未能提供关于黑盒模型工作原理的更深入洞见。针对这些局限性,我们提出基于能量的概念瓶颈模型(ECBMs)。我们的ECBMs使用一组神经网络来定义候选(输入、概念、类别)元组的联合能量。通过这种统一接口,预测、概念校正和条件依赖量化被表示为条件概率,这些概率通过组合不同的能量函数生成。我们的ECBMs解决了现有CBMs的两大局限,提供了更高的准确率和更丰富的概念解释。实证结果表明,我们的方法在真实世界数据集上优于现有最优技术。