Following the success of Word2Vec embeddings, graph embeddings (GEs) have gained substantial traction. GEs are commonly generated and evaluated extrinsically on downstream applications, but intrinsic evaluations of the original graph properties in terms of topological structure and semantic information have been lacking. Understanding these will help identify the deficiency of the various families of GE methods when vectorizing graphs in terms of preserving the relevant knowledge or learning incorrect knowledge. To address this, we propose RESTORE, a framework for intrinsic GEs assessment through graph reconstruction. We show that reconstructing the original graph from the underlying GEs yields insights into the relative amount of information preserved in a given vector form. We first introduce the graph reconstruction task. We generate GEs from three GE families based on factorization methods, random walks, and deep learning (with representative algorithms from each family) on the CommonSense Knowledge Graph (CSKG). We analyze their effectiveness in preserving the (a) topological structure of node-level graph reconstruction with an increasing number of hops and (b) semantic information on various word semantic and analogy tests. Our evaluations show deep learning-based GE algorithm (SDNE) is overall better at preserving (a) with a mean average precision (mAP) of 0.54 and 0.35 for 2 and 3-hop reconstruction respectively, while the factorization-based algorithm (HOPE) is better at encapsulating (b) with an average Euclidean distance of 0.14, 0.17, and 0.11 for 1, 2, and 3-hop reconstruction respectively. The modest performance of these GEs leaves room for further research avenues on better graph representation learning.
翻译:继Word2Vec嵌入的成功之后,图嵌入(GE)已获得广泛关注。通常,图嵌入在下游应用中被生成并基于外在任务进行评估,但针对原始图在拓扑结构和语义信息方面的内在评估仍较为缺乏。理解这些特性有助于识别各类图嵌入方法在向量化图时,在保留相关知识或学习错误知识方面的不足。为此,我们提出RESTORE——一种通过图重构进行内在图嵌入评估的框架。研究表明,从底层图嵌入中重构原始图能够揭示给定向量形式中保留信息的相对数量。我们首先引入图重构任务。在常识知识图谱(CSKG)上,我们从三个图嵌入族(基于因子分解、随机游走和深度学习方法,每族选取代表性算法)生成图嵌入。我们分析了它们在以下两方面的有效性:(a)随着跳数增加,节点级图重构的拓扑结构保留能力;(b)各类词语语义及类比测试中的语义信息保留能力。评估结果显示,基于深度学习的图嵌入算法(SDNE)在保留(a)方面整体更优,2跳和3跳重构的平均精确率均值(mAP)分别为0.54和0.35;而基于因子分解的算法(HOPE)在封装(b)方面表现更佳,1跳、2跳和3跳重构的平均欧氏距离分别为0.14、0.17和0.11。这些图嵌入的有限性能为更优的图表示学习研究留下了进一步探索的空间。