When to Pre-Train Graph Neural Networks? An Answer from Data Generation Perspective!

from arxiv, This paper was withdrawn because it was submitted without the consent of one of the co-authors. It does not contain any errors that need to be corrected

Recently, graph pre-training has attracted wide research attention, which aims to learn transferable knowledge from unlabeled graph data so as to improve downstream performance. Despite these recent attempts, the negative transfer is a major issue when applying graph pre-trained models to downstream tasks. Existing works made great efforts on the issue of what to pre-train and how to pre-train by designing a number of graph pre-training and fine-tuning strategies. However, there are indeed cases where no matter how advanced the strategy is, the "pre-train and fine-tune" paradigm still cannot achieve clear benefits. This paper introduces a generic framework W2PGNN to answer the crucial question of when to pre-train (i.e., in what situations could we take advantage of graph pre-training) before performing effortful pre-training or fine-tuning. We start from a new perspective to explore the complex generative mechanisms from the pre-training data to downstream data. In particular, W2PGNN first fits the pre-training data into graphon bases, each element of graphon basis (i.e., a graphon) identifies a fundamental transferable pattern shared by a collection of pre-training graphs. All convex combinations of graphon bases give rise to a generator space, from which graphs generated form the solution space for those downstream data that can benefit from pre-training. In this manner, the feasibility of pre-training can be quantified as the generation probability of the downstream data from any generator in the generator space. W2PGNN provides three broad applications, including providing the application scope of graph pre-trained models, quantifying the feasibility of performing pre-training, and helping select pre-training data to enhance downstream performance. We give a theoretically sound solution for the first application and extensive empirical justifications for the latter two applications.

翻译：近期，图预训练吸引了广泛的研究关注，其目标是从无标注图数据中学习可迁移知识以提升下游任务性能。尽管已有诸多尝试，负迁移仍是应用图预训练模型至下游任务时的主要挑战。现有研究通过设计大量图预训练与微调策略，在“预训练什么”和“如何预训练”问题上取得了显著进展。然而，确实存在某些情况，无论策略多么先进，“预训练-微调”范式仍无法带来明显收益。本文提出通用框架W2PGNN，旨在回答一个关键问题：何时进行预训练（即，在何种情境下我们能从图预训练中获益），从而避免盲目执行耗时的预训练或微调。我们从全新视角出发，探究预训练数据到下游数据的复杂生成机制。具体而言，W2PGNN首先将预训练数据拟合至图基（graphon basis），其中每个图基元素（即一个图基函数）识别出一组预训练图共享的基本可迁移模式。所有图基的凸组合构成生成器空间，由此生成的图构成了能受益于预训练的下游数据的解空间。通过这种方式，预训练的可行性可量化为下游数据从生成器空间中任一生成器生成的生成概率。W2PGNN提供三类广泛应用场景，包括：界定图预训练模型的应用范围、量化预训练执行的可行性、以及辅助选择预训练数据以提升下游性能。我们为首个应用提供了理论完备的解决方案，并为后两个应用提供了大量实验验证。