While traditional deep learning models often lack interpretability, concept bottleneck models (CBMs) provide inherent explanations via their concept representations. Specifically, they allow users to perform interventional interactions on these concepts by updating the concept values and thus correcting the predictive output of the model. Traditionally, however, these interventions are applied to the model only once and discarded afterward. To rectify this, we present concept bottleneck memory models (CB2M), an extension to CBMs. Specifically, a CB2M learns to generalize interventions to appropriate novel situations via a two-fold memory with which it can learn to detect mistakes and to reapply previous interventions. In this way, a CB2M learns to automatically improve model performance from a few initially obtained interventions. If no prior human interventions are available, a CB2M can detect potential mistakes of the CBM bottleneck and request targeted interventions. In our experimental evaluations on challenging scenarios like handling distribution shifts and confounded training data, we illustrate that CB2M are able to successfully generalize interventions to unseen data and can indeed identify wrongly inferred concepts. Overall, our results show that CB2M is a great tool for users to provide interactive feedback on CBMs, e.g., by guiding a user's interaction and requiring fewer interventions.
翻译:虽然传统深度学习模型常常缺乏可解释性,概念瓶颈模型(CBM)通过其概念表示提供了固有的解释。具体而言,它们允许用户通过更新概念值来对这些概念进行干预性交互,从而校正模型的预测输出。然而,传统上这些干预仅对模型应用一次,之后便被丢弃。为解决这一问题,我们提出了概念瓶颈记忆模型(CB2M),这是对CBM的一种扩展。具体来说,CB2M通过一种双元记忆学习将干预推广到适当的新场景,借此能够学会检测错误并重新应用先前的干预。通过这种方式,CB2M能从最初获得的少量干预中自动提升模型性能。若不存在先验的人类干预,CB2M能检测CBM瓶颈中的潜在错误并请求针对性的干预。在我们针对分布偏移和混杂训练数据等挑战性场景的实验评估中,我们展示了CB2M能够成功将干预推广到未见数据,并确实能识别出错误推断的概念。总体而言,我们的结果表明,CB2M是用户对CBM提供交互反馈的优秀工具,例如通过引导用户交互并减少所需的干预次数。