Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP.
翻译:预训练图神经网络以提取可迁移知识并应用于下游任务,已成为图表示学习的事实标准。近期研究聚焦于设计自监督预训练任务,从大规模无标注数据中提取有用且通用的可迁移知识。然而,这些方法必须面对一个不可避免的问题:旨在提取预训练任务有用信息的传统预训练策略,可能无法提取下游任务所需的全部有用信息。本文从信息瓶颈(Information Bottleneck, IB)的角度重新审视传统预训练与微调框架中的预训练过程,并证实预训练阶段的遗忘现象可能对下游任务造成负面影响。为此,我们提出一种新型延迟瓶颈化预训练(Delayed Bottlenecking Pre-training, DBP)框架,该框架在预训练阶段通过抑制压缩操作,尽可能保持潜在表示与训练数据之间的互信息,并将压缩操作延迟至微调阶段,以确保压缩过程能够利用有标签的微调数据及下游任务进行引导。为实现这一目标,我们设计了两种可直接优化的信息控制目标,并将其整合到实际模型设计中。在化学与生物学领域的大量实验证明了DBP的有效性。