Improving Concept Alignment in Vision-Language Concept Bottleneck Models

Concept Bottleneck Models (CBM) map the input image to a high-level human-understandable concept space and then make class predictions based on these concepts. Recent approaches automate the construction of CBM by prompting Large Language Models (LLM) to generate text concepts and then use Vision Language Models (VLM) to obtain concept scores to train a CBM. However, it is desired to build CBMs with concepts defined by human experts instead of LLM generated concepts to make them more trustworthy. In this work, we take a closer inspection on the faithfulness of VLM concept scores for such expert-defined concepts in domains like fine-grain bird species classification and animal classification. Our investigations reveal that frozen VLMs, like CLIP, struggle to correctly associate a concept to the corresponding visual input despite achieving a high classification performance. To address this, we propose a novel Contrastive Semi-Supervised (CSS) learning method which uses a few labeled concept examples to improve concept alignment (activate truthful visual concepts) in CLIP model. Extensive experiments on three benchmark datasets show that our approach substantially increases the concept accuracy and classification accuracy, yet requires only a fraction of the human-annotated concept labels. To further improve the classification performance, we also introduce a new class-level intervention procedure for fine-grain classification problems that identifies the confounding classes and intervenes their concept space to reduce errors.

翻译：概念瓶颈模型（CBM）将输入图像映射到高层次的、人类可理解的概念空间，并基于这些概念进行类别预测。近期方法通过提示大语言模型（LLM）自动生成文本概念，并利用视觉语言模型（VLM）获取概念分数来训练CBM，从而实现了CBM的自动化构建。然而，为使CBM更可靠，理想的情况是采用由人类专家定义的概念而非LLM生成的概念。本研究针对细粒度鸟类分类和动物分类等场景，深入考察了VLM概念分数对这类专家定义概念的忠实性。我们的研究发现，尽管冻结的VLM（如CLIP）具有较高的分类性能，但其在将概念正确关联到相应视觉输入方面存在困难。为解决此问题，我们提出了一种新颖的对比半监督（CSS）学习方法，该方法利用少量标注概念示例来改进CLIP模型中的概念对齐（激活真实的视觉概念）。在三个基准数据集上的大量实验表明，我们的方法显著提升了概念准确率和分类准确率，且仅需少量人工标注的概念标签。为进一步提升分类性能，我们还针对细粒度分类问题提出了一种新的类别级干预流程，该流程能识别混淆类别并干预其概念空间以减少错误。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日