In the past, the dichotomy between homophily and heterophily has inspired research contributions toward a better understanding of Deep Graph Networks' inductive bias. In particular, it was believed that homophily strongly correlates with better node classification predictions of message-passing methods. More recently, however, researchers pointed out that such dichotomy is too simplistic as we can construct node classification tasks where graphs are completely heterophilic but the performances remain high. Most of these works have also proposed new quantitative metrics to understand when a graph structure is useful, which implicitly or explicitly assume the correlation between node features and target labels. Our work empirically investigates what happens when this strong assumption does not hold, by formalising two generative processes for node classification tasks that allow us to build and study ad-hoc problems. To quantitatively measure the influence of the node features on the target labels, we also use a metric we call Feature Informativeness. We construct six synthetic tasks and evaluate the performance of six models, including structure-agnostic ones. Our findings reveal that previously defined metrics are not adequate when we relax the above assumption. Our contribution to the workshop aims at presenting novel research findings that could help advance our understanding of the field.
翻译:过去,同质性与异质性之间的二分法激发了研究者对深度图网络归纳偏差的深入理解。特别是,人们曾认为同质性强烈关联着消息传递方法在节点分类预测中的更优表现。然而,近期研究者指出,这种二分法过于简化,因为我们可构造出完全异质的图结构节点分类任务,但其性能依然保持较高水平。这些工作大多提出了新的量化指标来理解图结构的有效性,这些指标隐式或显式地假设节点特征与目标标签之间存在相关性。本研究通过形式化两种节点分类任务的生成过程,允许我们构建并研究特定问题,从而在实证层面探究当这一强假设不成立时会发生什么。为量化节点特征对目标标签的影响,我们采用一种称为“特征信息性”的度量指标。我们构建了六个合成任务,并评估了包括忽略结构信息的模型在内的六种模型性能。研究结果表明,在放宽上述假设后,此前定义的指标不再适用。我们向研讨会提交的成果旨在呈现新颖的研究发现,这些发现有助于推动对该领域的深入理解。