Graph-structured data is ubiquitous in the world which models complex relationships between objects, enabling various Web applications. Daily influxes of unlabeled graph data on the Web offer immense potential for these applications. Graph self-supervised algorithms have achieved significant success in acquiring generic knowledge from abundant unlabeled graph data. These pre-trained models can be applied to various downstream Web applications, saving training time and improving downstream (target) performance. However, different graphs, even across seemingly similar domains, can differ significantly in terms of attribute semantics, posing difficulties, if not infeasibility, for transferring the pre-trained models to downstream tasks. Concretely speaking, for example, the additional task-specific node information in downstream tasks (specificity) is usually deliberately omitted so that the pre-trained representation (transferability) can be leveraged. The trade-off as such is termed as "transferability-specificity dilemma" in this work. To address this challenge, we introduce an innovative deployment module coined as GraphControl, motivated by ControlNet, to realize better graph domain transfer learning. Specifically, by leveraging universal structural pre-trained models and GraphControl, we align the input space across various graphs and incorporate unique characteristics of target data as conditional inputs. These conditions will be progressively integrated into the model during fine-tuning or prompt tuning through ControlNet, facilitating personalized deployment. Extensive experiments show that our method significantly enhances the adaptability of pre-trained models on target attributed datasets, achieving 1.4-3x performance gain. Furthermore, it outperforms training-from-scratch methods on target data with a comparable margin and exhibits faster convergence.
翻译:图结构数据在现实世界中无处不在,它建模了对象之间的复杂关系,支撑了多种Web应用。Web上每日涌入的大量无标注图数据为这些应用提供了巨大潜力。图自监督算法从丰富的无标注图数据中获取通用知识方面取得了显著成功。这些预训练模型可应用于各种下游Web应用,节省训练时间并提升下游(目标)任务的性能。然而,不同图(甚至看似相似的领域)在属性语义上可能存在显著差异,这使得将预训练模型迁移到下游任务变得困难甚至不可行。具体而言,例如,下游任务中额外的任务特定节点信息(特异性)通常被人为忽略,以便利用预训练表示(可迁移性)。这种权衡在本工作中被称为“可迁移性-特异性困境”。为解决这一挑战,我们受ControlNet启发,引入了一种创新的部署模块——GraphControl,以实现更好的图域迁移学习。具体来说,通过利用通用结构预训练模型和GraphControl,我们对齐不同图的输入空间,并将目标数据的独有特征作为条件输入。这些条件将通过ControlNet在微调或提示调优过程中逐步融入模型,从而促进个性化部署。大量实验表明,我们的方法显著增强了预训练模型在目标属性数据集上的适应性,实现了1.4-3倍的性能提升。此外,它在目标数据上以相当的幅度优于从头训练的方法,并表现出更快的收敛速度。