When handling streaming graphs, existing graph representation learning models encounter a catastrophic forgetting problem, where previously learned knowledge of these models is easily overwritten when learning with newly incoming graphs. In response, Continual Graph Learning emerges as a novel paradigm enabling graph representation learning from static to streaming graphs. Our prior work, CaT is a replay-based framework with a balanced continual learning procedure, which designs a small yet effective memory bank for replaying data by condensing incoming graphs. Although the CaT alleviates the catastrophic forgetting problem, there exist three issues: (1) The graph condensation algorithm derived in CaT only focuses on labelled nodes while neglecting abundant information carried by unlabelled nodes; (2) The continual training scheme of the CaT overemphasises on the previously learned knowledge, limiting the model capacity to learn from newly added memories; (3) Both the condensation process and replaying process of the CaT are time-consuming. In this paper, we propose a psudo-label guided memory bank (PUMA) CGL framework, extending from the CaT to enhance its efficiency and effectiveness by overcoming the above-mentioned weaknesses and limits. To fully exploit the information in a graph, PUMA expands the coverage of nodes during graph condensation with both labelled and unlabelled nodes. Furthermore, a training-from-scratch strategy is proposed to upgrade the previous continual learning scheme for a balanced training between the historical and the new graphs. Besides, PUMA uses a one-time prorogation and wide graph encoders to accelerate the graph condensation and the graph encoding process in the training stage to improve the efficiency of the whole framework. Extensive experiments on four datasets demonstrate the state-of-the-art performance and efficiency over existing methods.
翻译:在处理流式图数据时,现有图表示学习模型面临灾难性遗忘问题——当学习新传入图时,模型先前学到的知识容易被覆盖。为此,持续图学习应运而生,成为一种将图表示学习从静态扩展至流式数据的新范式。我们之前的工作CaT是一种基于回放的平衡持续学习框架,通过压缩传入图来设计一个小型但有效的记忆库用于数据回放。尽管CaT缓解了灾难性遗忘问题,仍存在三个不足:(1) CaT中的图压缩算法仅关注标注节点,忽视了未标注节点携带的丰富信息;(2) CaT的持续训练方案过度强调先前学到的知识,限制了模型从新添加记忆中进行学习的能力;(3) CaT的压缩与回放过程均耗时严重。本文提出伪标签引导记忆库(PUMA)持续图学习框架,基于CaT通过克服上述缺陷来提升其效率与有效性。为充分利用图信息,PUMA在图压缩过程中将节点覆盖范围扩展至包含标注和未标注节点。此外,提出一种从头训练策略替代原有持续学习方案,实现历史图与新图的平衡训练。同时,PUMA采用单次传播策略与宽图编码器加速训练阶段的图压缩与编码过程,从而提升整体框架效率。在四个数据集上的大量实验表明,该方法在性能与效率上均优于现有方法。