Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, wherein users can correct mispredicted concepts and thus improve the model's performance. Recent work, however, has shown that intervention efficacy can be highly dependent on the order in which concepts are intervened on and on the model's architecture and training hyperparameters. We argue that this is rooted in a CBM's lack of train-time incentives for the model to be appropriately receptive to concept interventions. To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions. Our model learns a concept intervention policy in an end-to-end fashion from where it can sample meaningful intervention trajectories at train-time. This conditions IntCEMs to effectively select and receive concept interventions when deployed at test-time. Our experiments show that IntCEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach.
翻译:概念瓶颈模型(CBMs)通过使用一组高层概念构建和解释其预测,解决了神经架构的不透明性问题。这些模型的一个特殊性质是它们允许概念干预,即用户可以纠正错误预测的概念,从而提升模型的性能。然而,最近的研究表明,干预效果高度依赖于概念被干预的顺序、模型架构以及训练超参数。我们认为,这源于CBM在训练过程中缺乏激励模型适当接受概念干预的机制。为解决这一问题,我们提出了干预感知的概念嵌入模型(IntCEMs),这是一种基于CBM的新型架构和训练范式,旨在提升模型对测试时干预的接受能力。我们的模型以端到端的方式学习概念干预策略,从而能够在训练时采样有意义的干预轨迹。这使得IntCEMs在测试时能够有效地选择和接受概念干预。实验表明,当提供测试时概念干预时,IntCEMs显著优于当前最先进的概念可解释模型,证明了我们方法的有效性。