Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts

Advancements in foundation models (FMs) have led to a paradigm shift in machine learning. The rich, expressive feature representations from these pre-trained, large-scale FMs are leveraged for multiple downstream tasks, usually via lightweight fine-tuning of a shallow fully-connected network following the representation. However, the non-interpretable, black-box nature of this prediction pipeline can be a challenge, especially in critical domains such as healthcare, finance, and security. In this paper, we explore the potential of Concept Bottleneck Models (CBMs) for transforming complex, non-interpretable foundation models into interpretable decision-making pipelines using high-level concept vectors. Specifically, we focus on the test-time deployment of such an interpretable CBM pipeline "in the wild", where the input distribution often shifts from the original training distribution. We first identify the potential failure modes of such a pipeline under different types of distribution shifts. Then we propose an adaptive concept bottleneck framework to address these failure modes, that dynamically adapts the concept-vector bank and the prediction layer based solely on unlabeled data from the target domain, without access to the source (training) dataset. Empirical evaluations with various real-world distribution shifts show that our adaptation method produces concept-based interpretations better aligned with the test data and boosts post-deployment accuracy by up to 28%, aligning the CBM performance with that of non-interpretable classification.

翻译：基础模型（FMs）的进步已引发机器学习领域的范式转变。这些预训练大规模基础模型所生成的丰富、富有表现力的特征表示，通常通过在表征层后轻量级微调一个浅层全连接网络，被用于多项下游任务。然而，这种预测流程的非可解释性及黑箱特性可能构成挑战，尤其在医疗、金融和安全等关键领域。本文探讨了概念瓶颈模型（CBMs）的潜力，即利用高层概念向量将复杂、不可解释的基础模型转化为可解释的决策流程。具体而言，我们聚焦于此类可解释CBM流程在“真实场景”中的测试时部署，其中输入分布常偏离原始训练分布。我们首先识别了该流程在不同类型分布偏移下潜在的失效模式。随后，我们提出一种自适应概念瓶颈框架以应对这些失效模式，该框架仅基于目标域的无标注数据动态调整概念向量库与预测层，而无需访问源（训练）数据集。在多种真实世界分布偏移下的实证评估表明，我们的自适应方法能生成与测试数据更匹配的基于概念的解释，并将部署后准确率提升高达28%，使CBM性能与不可解释分类方法持平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/