VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance

Concept Bottleneck Models (CBMs) provide interpretable prediction by introducing an intermediate Concept Bottleneck Layer (CBL), which encodes human-understandable concepts to explain models' decision. Recent works proposed to utilize Large Language Models and pre-trained Vision-Language Models to automate the training of CBMs, making it more scalable and automated. However, existing approaches still fall short in two aspects: First, the concepts predicted by CBL often mismatch the input image, raising doubts about the faithfulness of interpretation. Second, it has been shown that concept values encode unintended information: even a set of random concepts could achieve comparable test accuracy to state-of-the-art CBMs. To address these critical limitations, in this work, we propose a novel framework called Vision-Language-Guided Concept Bottleneck Model (VLG-CBM) to enable faithful interpretability with the benefits of boosted performance. Our method leverages off-the-shelf open-domain grounded object detectors to provide visually grounded concept annotation, which largely enhances the faithfulness of concept prediction while further improving the model performance. In addition, we propose a new metric called Number of Effective Concepts (NEC) to control the information leakage and provide better interpretability. Extensive evaluations across five standard benchmarks show that our method, VLG-CBM, outperforms existing methods by at least 4.27% and up to 51.09% on Accuracy at NEC=5 (denoted as ANEC-5), and by at least 0.45% and up to 29.78% on average accuracy (denoted as ANEC-avg), while preserving both faithfulness and interpretability of the learned concepts as demonstrated in extensive experiments.

翻译：概念瓶颈模型通过引入一个中间概念瓶颈层，编码人类可理解的概念来解释模型的决策，从而提供可解释的预测。近期研究提出利用大型语言模型和预训练的视觉-语言模型来自动化CBMs的训练，使其更具可扩展性和自动化。然而，现有方法仍在两方面存在不足：首先，CBL预测的概念常与输入图像不匹配，引发了对解释忠实度的质疑。其次，研究表明概念值编码了非预期的信息：即使一组随机概念也能达到与最先进CBMs相当的测试精度。为解决这些关键局限，本文提出了一种名为视觉-语言引导概念瓶颈模型的新框架，以实现忠实的可解释性并提升性能。我们的方法利用现成的开放域接地目标检测器提供视觉接地的概念标注，这大幅提升了概念预测的忠实度，并进一步改善了模型性能。此外，我们提出了一种名为有效概念数量的新度量，以控制信息泄露并提供更好的可解释性。在五个标准基准上的广泛评估表明，我们的方法VLG-CBM在NEC=5时的准确率上至少超出已有方法4.27%，最高达51.09%；在平均准确率上至少超出0.45%，最高达29.78%，同时通过大量实验证明了所学概念在忠实度和可解释性上的保持。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日