Recent papers in the graph machine learning literature have introduced a number of approaches for hyperbolic representation learning. The asserted benefits are improved performance on a variety of graph tasks, node classification and link prediction included. Claims have also been made about the geometric suitability of particular hierarchical graph datasets to representation in hyperbolic space. Despite these claims, our work makes a surprising discovery: when simple Euclidean models with comparable numbers of parameters are properly trained in the same environment, in most cases, they perform as well, if not better, than all introduced hyperbolic graph representation learning models, even on graph datasets previously claimed to be the most hyperbolic as measured by Gromov $\delta$-hyperbolicity (i.e., perfect trees). This observation gives rise to a simple question: how can this be? We answer this question by taking a careful look at the field of hyperbolic graph representation learning as it stands today, and find that a number of papers fail to diligently present baselines, make faulty modelling assumptions when constructing algorithms, and use misleading metrics to quantify geometry of graph datasets. We take a closer look at each of these three problems, elucidate the issues, perform an analysis of methods, and introduce a parametric family of benchmark datasets to ascertain the applicability of (hyperbolic) graph neural networks.
翻译:近期图机器学习文献中的多篇论文引入了多种双曲表示学习方法。其宣称的优势在于提升了多种图任务(包括节点分类和链接预测)的性能。同时也有观点认为,特定层次化图数据集在几何结构上尤其适合用双曲空间进行表示。尽管存在这些主张,我们的研究却揭示了一个令人惊讶的发现:当具有可比参数量的简单欧几里得模型在相同环境下得到恰当训练时,在大多数情况下,其性能至少与所有已提出的双曲图表示学习模型相当,甚至更优——即使在那些先前被认为最具双曲特性(通过Gromov $\delta$-hyperbolicity度量,例如完美树结构)的图数据集上也是如此。这一现象引出了一个简单的问题:为何会如此?我们通过审慎审视当前双曲图表示学习领域的现状来回答该问题,发现许多论文未能严谨地呈现基线模型,在构建算法时存在有缺陷的建模假设,并使用具有误导性的指标来量化图数据集的几何特性。我们针对这三类问题逐一深入剖析,阐明其症结所在,对现有方法进行系统性分析,并引入一个参数化的基准数据集家族,以评估(双曲)图神经网络的实际适用性。