Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

Recently, interpretable machine learning has re-explored concept bottleneck models (CBM). An advantage of this model class is the user's ability to intervene on predicted concept values, affecting the downstream output. In this work, we introduce a method to perform such concept-based interventions on pretrained neural networks, which are not interpretable by design, only given a small validation set with concept labels. Furthermore, we formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We focus on backbone architectures of varying complexity, from simple, fully connected neural nets to Stable Diffusion. We demonstrate that the proposed fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of our techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes are more intervenable than CBMs. Lastly, we establish that our methods are still effective under vision-language-model-based concept annotations, alleviating the need for a human-annotated validation set.

翻译：近年来，可解释机器学习领域重新探索了概念瓶颈模型（CBM）。此类模型的优势在于用户能够对预测的概念值进行干预，从而影响下游输出。本研究提出一种方法，可在仅给定带有概念标签的小型验证集的情况下，对预先训练且非按可解释性设计的神经网络实施此类基于概念的干预。此外，我们将可干预性形式化为衡量基于概念干预有效性的指标，并利用该定义对黑盒模型进行微调。我们通过实验探究了黑盒分类器在合成表格数据与自然图像基准测试中的可干预性，重点关注从简单的全连接神经网络到Stable Diffusion等不同复杂度的骨干架构。实验表明，所提出的微调方法能提升干预有效性，且通常能产生更校准的预测结果。为展示本技术的实际效用，我们将其应用于深度胸部X光分类器，结果显示微调后的黑盒模型比CBM具备更强的可干预性。最后，我们证实该方法在基于视觉语言模型的概念标注下依然有效，从而减少对人类标注验证集的依赖。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/