In reliable decision-making systems based on machine learning, models have to be robust to distributional shifts or provide the uncertainty of their predictions. In node-level problems of graph learning, distributional shifts can be especially complex since the samples are interdependent. To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributional shifts. However, most graph benchmarks that consider distributional shifts for node-level problems focus mainly on node features, while data in graph problems is primarily defined by its structural properties. In this work, we propose a general approach for inducing diverse distributional shifts based on graph structure. We use this approach to create data splits according to several structural node properties: popularity, locality, and density. In our experiments, we thoroughly evaluate the proposed distributional shifts and show that they are quite challenging for existing graph models. We hope that the proposed approach will be helpful for the further development of reliable graph machine learning.
翻译:在基于机器学习的可靠决策系统中,模型需要能够应对分布偏移,或提供其预测的不确定性。在图学习的节点级别问题中,分布偏移可能尤为复杂,因为样本之间相互依赖。为了评估图模型的性能,有必要在多样化且有意义的分布偏移条件下对其进行测试。然而,当前大多数针对节点级问题考虑分布偏移的图基准主要聚焦于节点特征,而图问题中的数据主要由其结构属性定义。在本工作中,我们提出了一种基于图结构诱导多样化分布偏移的通用方法。我们利用该方法根据若干结构节点属性(流行度、局部性和密度)创建数据划分。在实验中,我们全面评估了所提出的分布偏移,并表明它们对现有图模型构成了相当大的挑战。我们希望所提出的方法能够为可靠图机器学习的进一步发展提供帮助。