Node embedding learns low-dimensional vectors for nodes in the graph. Recent state-of-the-art embedding approaches take Personalized PageRank (PPR) as the proximity measure and factorize the PPR matrix or its adaptation to generate embeddings. However, little previous work analyzes what information is encoded by these approaches, and how the information correlates with their superb performance in downstream tasks. In this work, we first show that state-of-the-art embedding approaches that factorize a PPR-related matrix can be unified into a closed-form framework. Then, we study whether the embeddings generated by this strategy can be inverted to better recover the graph topology information than random-walk based embeddings. To achieve this, we propose two methods for recovering graph topology via PPR-based embeddings, including the analytical method and the optimization method. Extensive experimental results demonstrate that the embeddings generated by factorizing a PPR-related matrix maintain more topological information, such as common edges and community structures, than that generated by random walks, paving a new way to systematically comprehend why PPR-based node embedding approaches outperform random walk-based alternatives in various downstream tasks. To the best of our knowledge, this is the first work that focuses on the interpretability of PPR-based node embedding approaches.
翻译:节点嵌入学习图中节点的低维向量表示。当前最先进的嵌入方法采用个性化PageRank(PPR)作为邻近度度量,并通过分解PPR矩阵或其变体生成嵌入向量。然而,现有研究鲜少分析这些方法编码了何种信息,以及这些信息如何与其在下游任务中的卓越性能相关联。本文首先证明,当前最先进的基于PPR相关矩阵分解的嵌入方法可被统一到一个闭式框架中。进而,我们研究通过该策略生成的嵌入是否能够比基于随机游走的嵌入更有效地反推还原图拓扑信息。为此,我们提出了两种基于PPR嵌入的图拓扑恢复方法:解析法与优化法。大量实验结果表明,相较于随机游走生成的嵌入,通过分解PPR相关矩阵得到的嵌入保留了更多拓扑信息(如公共边和社区结构),这为系统理解基于PPR的节点嵌入方法在各类下游任务中优于基于随机游走的方法提供了新途径。据我们所知,本研究是首个聚焦于基于PPR的节点嵌入方法可解释性的工作。