Following the success of Word2Vec embeddings, graph embeddings (GEs) have gained substantial traction. GEs are commonly generated and evaluated extrinsically on downstream applications, but intrinsic evaluations of the original graph properties in terms of topological structure and semantic information have been lacking. Understanding these will help identify the deficiency of the various families of GE methods when vectorizing graphs in terms of preserving the relevant knowledge or learning incorrect knowledge. To address this, we propose RESTORE, a framework for intrinsic GEs assessment through graph reconstruction. We show that reconstructing the original graph from the underlying GEs yields insights into the relative amount of information preserved in a given vector form. We first introduce the graph reconstruction task. We generate GEs from three GE families based on factorization methods, random walks, and deep learning (with representative algorithms from each family) on the CommonSense Knowledge Graph (CSKG). We analyze their effectiveness in preserving the (a) topological structure of node-level graph reconstruction with an increasing number of hops and (b) semantic information on various word semantic and analogy tests. Our evaluations show deep learning-based GE algorithm (SDNE) is overall better at preserving (a) with a mean average precision (mAP) of 0.54 and 0.35 for 2 and 3-hop reconstruction respectively, while the factorization-based algorithm (HOPE) is better at encapsulating (b) with an average Euclidean distance of 0.14, 0.17, and 0.11 for 1, 2, and 3-hop reconstruction respectively. The modest performance of these GEs leaves room for further research avenues on better graph representation learning.
翻译:继Word2Vec嵌入成功之后,图嵌入(GE)已获得广泛关注。GE通常在下游应用中通过外部任务生成和评估,但针对原始图在拓扑结构和语义信息方面的内在属性评估仍存在不足。理解这些属性有助于识别不同GE方法族在向量化图时存在的缺陷,例如是否保留了相关知识或学习了错误知识。为此,我们提出RESTORE——一种通过图重构实现内在GE评估的框架。研究表明,从底层GE重构原始图能够揭示给定向量形式中保留的相对信息量。我们首先引入图重构任务。在常识知识图谱(CSKG)上,我们基于分解方法、随机游走和深度学习三大GE方法族(每族选取代表性算法)生成GE。我们分析其在以下两方面的有效性:(a)节点级图重构的拓扑结构(随跳数增加);(b)各种词汇语义和类比测试中的语义信息。评估结果表明,基于深度学习的GE算法(SDNE)在(a)方面整体表现更优,2跳和3跳重构的平均精度均值(mAP)分别为0.54和0.35;而基于分解的算法(HOPE)在(b)方面更具优势,1跳、2跳和3跳重构的平均欧氏距离分别为0.14、0.17和0.11。这些GE的适度性能为图表示学习的进一步研究留出了空间。