Independent Component Analysis (ICA) is an effective method for interpreting the intrinsic geometric structure of embeddings as semantic components. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, there are remaining non-independencies between the estimated components that ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them. The entire structure was revealed through visualization using a maximum spanning tree of semantic components. These findings allow for further understanding of embeddings through ICA.
翻译:独立成分分析(ICA)是一种解释嵌入内在几何结构作为语义成分的有效方法。虽然ICA理论假设嵌入可以线性分解为独立成分,但现实数据通常不满足这一假设。因此,估计成分之间仍存在ICA无法消除的非独立性。我们使用高阶相关性对这些非独立性进行了量化,并证明当两个成分间的高阶相关性较大时,表明它们之间存在较强的语义关联。通过使用语义成分的最大生成树进行可视化,揭示了整体结构。这些发现有助于通过ICA进一步理解嵌入。