While traditional deep learning models often lack interpretability, concept bottleneck models (CBMs) provide inherent explanations via their concept representations. Specifically, they allow users to perform interventional interactions on these concepts by updating the concept values and thus correcting the predictive output of the model. Traditionally, however, these interventions are applied to the model only once and discarded afterward. To rectify this, we present concept bottleneck memory models (CB2M), an extension to CBMs. Specifically, a CB2M learns to generalize interventions to appropriate novel situations via a two-fold memory with which it can learn to detect mistakes and to reapply previous interventions. In this way, a CB2M learns to automatically improve model performance from a few initially obtained interventions. If no prior human interventions are available, a CB2M can detect potential mistakes of the CBM bottleneck and request targeted interventions. In our experimental evaluations on challenging scenarios like handling distribution shifts and confounded training data, we illustrate that CB2M are able to successfully generalize interventions to unseen data and can indeed identify wrongly inferred concepts. Overall, our results show that CB2M is a great tool for users to provide interactive feedback on CBMs, e.g., by guiding a user's interaction and requiring fewer interventions.
翻译:虽然传统深度学习模型通常缺乏可解释性,但概念瓶颈模型(CBM)通过其概念表示提供了内在的解释。具体而言,它们允许用户通过更新概念值来对这些概念进行干预性交互,从而修正模型的预测输出。然而,传统上这些干预仅对模型应用一次,之后便被丢弃。为解决这一问题,我们提出了概念瓶颈记忆模型(CB2M),作为CBM的扩展。具体来说,CB2M通过一个双重记忆机制学习将干预推广到适当的新情境,借此能够检测错误并重新应用先前的干预。通过这种方式,CB2M能够从少量初始干预中自动提升模型性能。若缺乏事先的人类干预,CB2M可检测CBM瓶颈中的潜在错误并请求针对性的干预。在我们对分布偏移和混杂训练数据等挑战性场景的实验评估中,我们展示了CB2M能够成功地将干预推广到未见数据,并确实能识别出错误推断的概念。总体而言,我们的结果表明,CB2M是一个出色的工具,可使用户对CBM提供交互式反馈,例如通过引导用户交互并减少所需干预次数。