This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expressed as a composition of a few intrinsic interpretable axes and that these semantic axes remain consistent across different languages, algorithms, and modalities. The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.
翻译:本研究利用独立成分分析(ICA)揭示词或图像嵌入中一致的语义结构。我们的方法通过利用主成分分析(PCA)白化处理后残留的各向异性信息,从预训练模型的嵌入中提取独立的语义成分。我们证明每个嵌入可以表示为几个内在可解释轴的组合,并且这些语义轴在不同语言、算法和模态之间保持一致。嵌入几何模式中通用语义结构的发现加深了我们对嵌入表示的理解。