CLIP embeddings have demonstrated remarkable performance across a wide range of multimodal applications. However, these high-dimensional, dense vector representations are not easily interpretable, limiting our understanding of the rich structure of CLIP and its use in downstream applications that require transparency. In this work, we show that the semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts. We formulate this problem as one of sparse recovery and propose a novel method, Sparse Linear Concept Embeddings, for transforming CLIP representations into sparse linear combinations of human-interpretable concepts. Distinct from previous work, SpLiCE is task-agnostic and can be used, without training, to explain and even replace traditional dense CLIP representations, maintaining high downstream performance while significantly improving their interpretability. We also demonstrate significant use cases of SpLiCE representations including detecting spurious correlations and model editing.
翻译:CLIP嵌入已在广泛的多模态应用中展现出卓越的性能。然而,这些高维、稠密的向量表示不易解释,限制了我们理解CLIP的丰富结构及其在需要透明度的下游应用中的使用。本文中,我们证明可以利用CLIP潜在空间的语义结构来提供可解释性,从而将表示分解为语义概念。我们将该问题形式化为稀疏恢复问题,并提出一种新方法——稀疏线性概念嵌入,用于将CLIP表示转化为人类可解释概念的稀疏线性组合。与先前工作不同,SpLiCE是任务无关的,无需训练即可用于解释甚至替代传统的稠密CLIP表示,在保持优异下游性能的同时显著提升了其可解释性。我们还展示了SpLiCE表示的重要应用案例,包括检测虚假相关性和模型编辑。