In reliable decision-making systems based on machine learning, models have to be robust to distributional shifts or provide the uncertainty of their predictions. In node-level problems of graph learning, distributional shifts can be especially complex since the samples are interdependent. To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributional shifts. However, most graph benchmarks considering distributional shifts for node-level problems focus mainly on node features, while structural properties are also essential for graph problems. In this work, we propose a general approach for inducing diverse distributional shifts based on graph structure. We use this approach to create data splits according to several structural node properties: popularity, locality, and density. In our experiments, we thoroughly evaluate the proposed distributional shifts and show that they can be quite challenging for existing graph models. We also reveal that simple models often outperform more sophisticated methods on the considered structural shifts. Finally, our experiments provide evidence that there is a trade-off between the quality of learned representations for the base classification task under structural distributional shift and the ability to separate the nodes from different distributions using these representations.
翻译:在基于机器学习的可靠决策系统中,模型需要能够抵抗分布偏移或提供其预测的不确定性。在图学习的节点级问题中,由于样本相互依赖,分布偏移可能尤为复杂。为了评估图模型的性能,必须在多样且具有意义的分布偏移下对其进行测试。然而,大多数考虑节点级问题分布偏移的图基准主要关注节点特征,而结构属性对图问题同样至关重要。本文提出了一种基于图结构诱导多样化分布偏移的通用方法。我们利用该方法根据多种结构节点属性(流行度、局部性和密度)创建数据划分。在实验中,我们全面评估了所提出的分布偏移,并表明这些偏移对现有图模型可能构成较大挑战。同时,我们揭示了简单模型在处理所考虑的结构偏移时往往优于更复杂的方法。最后,我们的实验证据表明,在结构分布偏移下,基础分类任务所学表示的质量与利用这些表示区分不同分布中节点的能力之间存在权衡关系。