Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we focus on this property and propose a novel method, Axis Tour, which optimizes the order of the axes. Inspired by Word Tour, a one-dimensional word embedding method, we aim to improve the clarity of the word embedding space by maximizing the semantic continuity of the axes. Furthermore, we show through experiments on downstream tasks that Axis Tour constructs better low-dimensional embeddings compared to both PCA and ICA.
翻译:词嵌入是自然语言处理中最重要的组成部分之一,但解释高维嵌入仍然是一个具有挑战性的问题。为解决这一问题,独立成分分析(ICA)被认为是一种有效的解决方案。经ICA变换的词嵌入可揭示可解释的语义轴,然而这些轴的顺序是任意的。本研究聚焦于这一特性,提出了一种新方法——轴旅行(Axis Tour),用于优化轴的顺序。受一维词嵌入方法词旅行(Word Tour)的启发,我们通过最大化轴的语义连续性来提升词嵌入空间的清晰度。此外,在下游任务上的实验表明,与主成分分析(PCA)和ICA相比,轴旅行能构建更优的低维嵌入。