It has become a popular paradigm to transfer the knowledge of large-scale pre-trained models to various downstream tasks via fine-tuning the entire model parameters. However, with the growth of model scale and the rising number of downstream tasks, this paradigm inevitably meets the challenges in terms of computation consumption and memory footprint issues. Recently, Parameter-Efficient Fine-Tuning (PEFT) (e.g., Adapter, LoRA, BitFit) shows a promising paradigm to alleviate these concerns by updating only a portion of parameters. Despite these PEFTs having demonstrated satisfactory performance in natural language processing, it remains under-explored for the question of whether these techniques could be transferred to graph-based tasks with Graph Transformer Networks (GTNs). Therefore, in this paper, we fill this gap by providing extensive benchmarks with traditional PEFTs on a range of graph-based downstream tasks. Our empirical study shows that it is sub-optimal to directly transfer existing PEFTs to graph-based tasks due to the issue of feature distribution shift. To address this issue, we propose a novel structure-aware PEFT approach, named G-Adapter, which leverages graph convolution operation to introduce graph structure (e.g., graph adjacent matrix) as an inductive bias to guide the updating process. Besides, we propose Bregman proximal point optimization to further alleviate feature distribution shift by preventing the model from aggressive update. Extensive experiments demonstrate that G-Adapter obtains the state-of-the-art performance compared to the counterparts on nine graph benchmark datasets based on two pre-trained GTNs, and delivers tremendous memory footprint efficiency compared to the conventional paradigm.
翻译:将大规模预训练模型的知识通过微调全模型参数迁移至各种下游任务已成为一种流行范式。然而,随着模型规模的扩大和下游任务数量的增加,该范式在计算消耗和内存占用方面不可避免地面临挑战。近年来,参数高效微调(PEFT)(例如Adapter、LoRA、BitFit)通过仅更新部分参数,展现出了缓解这些问题的前景。尽管这些PEFT方法在自然语言处理中已展现出令人满意的性能,但对于这些技术能否迁移至基于图Transformer网络(GTN)的图任务这一问题,仍缺乏深入探索。因此,本文通过在一系列基于图的下游任务上对传统PEFT方法进行广泛基准测试,填补了这一空白。我们的实证研究表明,由于特征分布偏移问题,直接将现有PEFT方法迁移至图任务是次优的。为解决此问题,我们提出了一种新颖的结构感知PEFT方法,名为G-Adapter,该方法利用图卷积操作引入图结构(例如图邻接矩阵)作为归纳偏置来指导更新过程。此外,我们提出布雷格曼近端点优化,通过防止模型激进更新进一步缓解特征分布偏移。大量实验表明,在两个预训练GTN的九个图基准数据集上,G-Adapter相比现有方法取得了最先进的性能,并且在内存占用效率上相比传统范式具有显著优势。