Conceptualizing Embeddings: Sparse Disentanglement for Vision-Language Models

Vision-language models learn powerful multimodal embeddings, yet their internal semantics remain opaque. While sparse autoencoders (SAEs) can extract interpretable features, they rely on expanding the representation dimension, which compromises the original geometry and introduces redundancy. We introduce CEDAR (Conceptual Embedding Disentanglement via Adaptive Rotation), a post-hoc method that reveals the compositional structure of pretrained embeddings without increasing dimensionality. By learning an invertible transformation with a top-$k$ sparsity bottleneck, CEDAR concentrates semantic information into axis-aligned disentangled coordinates. In CLIP-like architecture, individual coordinates can be interpreted with textual concepts, while for generative models such as BLIP, they can be decoded into natural language descriptions. Experiments demonstrate that CEDAR achieves a competitive reconstruction-sparsity trade-off while producing explanations that are more interpretable and better aligned with human perception. Our results suggest that the apparent entanglement in vision-language representations can be resolved through a suitable change of basis, eliminating the need for overcomplete expansions.

翻译：视觉语言模型能够学习强大的多模态嵌入，但其内部语义仍不透明。尽管稀疏自编码器（SAE）可以提取可解释特征，但它们依赖于扩展表示维度，这损害了原始几何结构并引入冗余。我们提出CEDAR（通过自适应旋转实现概念化嵌入解耦），这是一种事后方法，能在不增加维度的前提下揭示预训练嵌入的组成结构。通过使用top-$k$稀疏瓶颈学习可逆变换，CEDAR将语义信息集中于对齐坐标轴的解耦坐标中。在类CLIP架构中，单个坐标可通过文本概念进行解释；而对于BLIP等生成模型，这些坐标可解码为自然语言描述。实验表明，CEDAR在重建与稀疏性之间实现了竞争性的平衡，同时生成更具可解释性且更符合人类感知的说明。我们的结果表明，视觉语言表征中看似纠缠的语义可通过适当的基变换得到解决，从而无需使用超完备扩展。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

迈向透明人工智能（AI）：可解释性语言模型综述

专知会员服务

16+阅读 · 2025年9月29日

在无标注条件下适配视觉—语言模型：全面综述

专知会员服务

13+阅读 · 2025年8月9日

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

专知会员服务

21+阅读 · 2025年8月9日

视觉语言模型泛化到新领域：全面综述

专知会员服务

38+阅读 · 2025年6月27日