The history of artificial intelligence (AI) has witnessed the significant impact of high-quality data on various deep learning models, such as ImageNet for AlexNet and ResNet. Recently, instead of designing more complex neural architectures as model-centric approaches, the attention of AI community has shifted to data-centric ones, which focuses on better processing data to strengthen the ability of neural models. Graph learning, which operates on ubiquitous topological data, also plays an important role in the era of deep learning. In this survey, we comprehensively review graph learning approaches from the data-centric perspective, and aim to answer three crucial questions: (1) when to modify graph data, (2) what part of the graph data needs modification to unlock the potential of various graph models, and (3) how to safeguard graph models from problematic data influence. Accordingly, we propose a novel taxonomy based on the stages in the graph learning pipeline, and highlight the processing methods for different data structures in the graph data, i.e., topology, feature and label. Furthermore, we analyze some potential problems embedded in graph data and discuss how to solve them in a data-centric manner. Finally, we provide some promising future directions for data-centric graph learning.
翻译:人工智能(AI)发展史见证了高质量数据对各种深度学习模型的深远影响,例如ImageNet之于AlexNet和ResNet。近年来,AI研究界的关注点已从设计更复杂神经架构的以模型为中心方法,转向以数据为中心的方法,即通过优化数据处理来增强神经模型的能力。图学习作为处理普遍存在的拓扑数据的重要技术,在深度学习时代同样发挥着关键作用。本综述从数据中心的视角系统回顾图学习方法,旨在回答三个核心问题:(1)何时修改图数据,(2)图数据的哪些部分需要修改以释放各类图模型的潜力,以及(3)如何保护图模型免受问题数据的影响。据此,我们基于图学习流程的阶段提出了一种新颖的分类体系,重点阐述了图数据中不同数据结构(即拓扑结构、特征与标签)的处理方法。此外,我们分析了图数据中可能存在的潜在问题,并探讨了如何以数据为中心的方式解决这些问题。最后,我们为以数据为中心的图学习提出了若干具有前景的未来研究方向。