This study employs Independent Component Analysis (ICA) to uncover universal properties of embeddings of words or images. Our approach extracts independent semantic components of embeddings, enabling each embedding to be represented as a composition of intrinsic interpretable axes. We demonstrate that embeddings can be expressed as a combination of a few axes and that these semantic axes are consistent across different languages, modalities, and embedding algorithms. This discovery of universal properties in embeddings contributes to model interpretability, potentially facilitating the development of highly interpretable models and the compression of large-scale models.
翻译:本研究采用独立成分分析(ICA)来揭示单词或图像嵌入的通用性质。我们的方法提取了嵌入的独立语义成分,使得每个嵌入能够表示为若干固有可解释轴的组合。我们证明,嵌入可以表示为少数几个轴的组合,并且这些语义轴在不同语言、模态和嵌入算法之间保持一致。这种嵌入中通用性质的发现有助于提升模型的可解释性,有望促进高度可解释模型的开发以及大规模模型的压缩。