Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, wherein users can correct mispredicted concepts and thus improve the model's performance. Recent work, however, has shown that intervention efficacy can be highly dependent on the order in which concepts are intervened on and on the model's architecture and training hyperparameters. We argue that this is rooted in a CBM's lack of train-time incentives for the model to be appropriately receptive to concept interventions. To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions. Our model learns a concept intervention policy in an end-to-end fashion from where it can sample meaningful intervention trajectories at train-time. This conditions IntCEMs to effectively select and receive concept interventions when deployed at test-time. Our experiments show that IntCEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach.
翻译:概念瓶颈模型(CBMs)通过使用一组高层概念来构建和解释其预测,从而解决神经架构的不透明性问题。这些模型的一个特殊属性是允许概念干预,即用户可以纠正错误预测的概念,从而提高模型的性能。然而,最近的研究表明,干预效果可能高度依赖于概念干预的顺序以及模型的架构和训练超参数。我们认为,这源于CBM在训练时缺乏激励,使模型无法适当地接收概念干预。为了解决这个问题,我们提出了干预感知的概念嵌入模型(IntCEMs),这是一种基于CBM的新型架构和训练范式,旨在提高模型对测试时干预的接受度。我们的模型以端到端的方式学习概念干预策略,从而能够在训练时采样有意义的干预轨迹。这使得IntCEMs在测试时部署时能够有效地选择和接收概念干预。我们的实验表明,当提供测试时概念干预时,IntCEMs显著优于最先进的概念可解释模型,证明了我们方法的有效性。