Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we focus on this property and propose a novel method, Axis Tour, which optimizes the order of the axes. Inspired by Word Tour, a one-dimensional word embedding method, we aim to improve the clarity of the word embedding space by maximizing the semantic continuity of the axes. Furthermore, we show through experiments on downstream tasks that Axis Tour yields better or comparable low-dimensional embeddings compared to both PCA and ICA.
翻译:词嵌入是自然语言处理中最重要的组成部分之一,但解释高维嵌入仍然是一个具有挑战性的问题。为解决此问题,独立成分分析(ICA)被确认为一种有效的解决方案。ICA变换后的词嵌入揭示了可解释的语义轴;然而,这些轴的顺序是任意的。在本研究中,我们关注这一特性,并提出了一种名为Axis Tour的新方法,用于优化轴的顺序。受一维词嵌入方法Word Tour的启发,我们的目标是通过最大化轴的语义连续性来提高词嵌入空间的清晰度。此外,我们通过下游任务的实验表明,与主成分分析(PCA)和ICA相比,Axis Tour能产生更好或相当的低维嵌入。