Post-hoc Concept Bottleneck Models

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models. A key benefit of PCBM is that it enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new distributions. PCBM allows for global model edits, which can be more efficient than previous works on local interventions that fix a specific prediction. Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using data from the target domain or model retraining.

翻译：概念瓶颈模型（CBMs）将输入映射到一组可解释概念（“瓶颈”），并利用这些概念进行预测。概念瓶颈增强了可解释性，因为可以通过研究它来理解模型在输入中“看到”哪些概念，以及其中哪些概念被视为重要。然而，CBMs在实践中具有局限性，因为它们需要训练数据中的密集概念注释来学习瓶颈。此外，CBMs的准确性通常不如无约束神经网络，降低了实践中部署它们的动力。在这项工作中，我们通过引入事后概念瓶颈模型（PCBMs）来解决CBMs的这些局限性。我们证明，可以将任何神经网络转化为PCBM，而不牺牲模型性能，同时仍保留可解释性优势。当训练数据中没有概念注释时，我们表明PCBM可以通过多模态模型从其他数据集或概念的自然语言描述中迁移概念。PCBM的一个关键优势在于，它使用户能够快速调试和更新模型，以减少虚假关联并改进对新分布的泛化能力。PCBM允许进行全局模型编辑，这比以往针对特定预测进行局部干预的工作更为高效。通过一项模型编辑用户研究，我们表明，通过概念级反馈编辑PCBM可以在不使用目标领域数据或重新训练模型的情况下实现显著的性能提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/