Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g. shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

翻译：为特定目的调整基础模型已成为构建下游应用机器学习系统的标准方法。然而，适应过程中具体发生何种机制仍是一个开放性问题。本文为CLIP视觉Transformer开发了一种新的稀疏自编码器（SAE），命名为PatchSAE，用于在细粒度层面（例如物体的形状、颜色或语义）提取可解释概念及其基于图像块的空间归因。我们探究了这些概念如何影响下游图像分类任务中的模型输出，并研究了当前最先进的基于提示的适应技术如何改变模型输入与这些概念的关联。虽然适应模型与非适应模型之间的概念激活仅有轻微变化，但我们发现常见适应任务中的性能提升主要可通过非适应基础模型中已存在的既有概念来解释。本研究为视觉Transformer的训练和使用SAE提供了具体框架，并为解释适应机制提供了新的见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ICCV2021】基于耦合语义注意力的弱监督目标定位

专知会员服务

16+阅读 · 2021年8月2日