Graphs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.
翻译:图已成为表示和分析Web中复杂模式与丰富信息的自然选择,支持在线页面分类和社交推荐等应用。当前“预训练-微调”范式在图机器学习任务中被广泛采用,尤其在标注节点有限的场景中。然而,该方法常存在预训练任务与下游任务训练目标之间的错位问题。这种差距会引发“负迁移”现象,即预训练获得的知识反而对下游任务性能产生不利影响。自然语言处理(NLP)中基于提示学习的兴起,提示了将“预训练-提示”范式适配到图中的潜力。然而,现有图提示技术专为同构图设计,忽视了Web图固有的异质性。为弥合这一差距,我们提出HetGPT——一种通用后训练提示框架,旨在提升预训练异构图神经网络(HGNNs)的预测性能。其关键在于设计了一种新颖的提示函数,该函数融合了虚拟类别提示与异质性特征提示,旨在重构下游任务以匹配预训练任务。此外,HetGPT引入多视角邻域聚合机制,捕捉异构图中的复杂邻域结构。在三个基准数据集上的广泛实验表明,HetGPT能够增强最先进HGNNs在半监督节点分类任务上的性能。