Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fourteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike self-supervised learning.
翻译:图属性预测任务重要且数量众多。尽管每个任务仅提供少量标注样本,但无标注图已从多种来源大规模收集。传统方法是用无标注图训练自监督任务模型,然后在预测任务上进行微调。然而,自监督任务知识可能与预测需求不一致甚至冲突。本文提出从大规模无标注图中提取知识,并将其作为特定有用数据点来增强每个属性预测模型。我们利用扩散模型充分挖掘无标注图,并设计两个新目标以引导模型通过每个任务的标注数据进行去噪,生成任务特定的图样本及其标签。实验表明,本数据驱动方法在15个任务上的表现显著优于14种现有方法。与自监督学习不同,无标注数据带来的性能提升可通过生成的标注样本直观体现。