In reliable decision-making systems based on machine learning, models have to be robust to distributional shifts or provide the uncertainty of their predictions. In node-level problems of graph learning, distributional shifts can be especially complex since the samples are interdependent. To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributional shifts. However, most graph benchmarks considering distributional shifts for node-level problems focus mainly on node features, while structural properties are also essential for graph problems. In this work, we propose a general approach for inducing diverse distributional shifts based on graph structure. We use this approach to create data splits according to several structural node properties: popularity, locality, and density. In our experiments, we thoroughly evaluate the proposed distributional shifts and show that they can be quite challenging for existing graph models. We also reveal that simple models often outperform more sophisticated methods on these challenging shifts. Finally, our experiments provide evidence that there is a trade-off between the quality of learned representations for the base classification task under structural distributional shift and the ability to separate the nodes from different distributions using these representations.
翻译:在基于机器学习的可靠决策系统中,模型需具备对分布偏移的鲁棒性,或能够提供其预测的不确定性。在图学习的节点级问题中,由于样本相互依赖,分布偏移可能尤为复杂。为评估图模型的性能,需在多样化且具有意义的分布偏移下对其进行测试。然而,现有针对节点级问题考虑分布偏移的图基准主要聚焦于节点特征,而结构属性对图问题同样至关重要。本文提出一种基于图结构诱导多样化分布偏移的通用方法。我们利用该方法,根据若干结构节点属性(流行度、局部性和密度)创建数据划分。实验中,我们对所提出的分布偏移进行了全面评估,结果表明其对现有图模型可能构成显著挑战。我们还发现,在这些挑战性偏移下,简单模型往往优于更复杂的方法。最后,我们的实验证明:在结构分布偏移下,基础分类任务所学表示的质量,与利用这些表示区分不同分布节点的能力之间存在权衡。