Lifelong Learning on Evolving Graphs Under the Constraints of Imbalanced Classes and New Classes

Lifelong graph learning deals with the problem of continually adapting graph neural network (GNN) models to changes in evolving graphs. We address two critical challenges of lifelong graph learning in this work: dealing with new classes and tackling imbalanced class distributions. The combination of these two challenges is particularly relevant since newly emerging classes typically resemble only a tiny fraction of the data, adding to the already skewed class distribution. We make several contributions: First, we show that the amount of unlabeled data does not influence the results, which is an essential prerequisite for lifelong learning on a sequence of tasks. Second, we experiment with different label rates and show that our methods can perform well with only a tiny fraction of annotated nodes. Third, we propose the gDOC method to detect new classes under the constraint of having an imbalanced class distribution. The critical ingredient is a weighted binary cross-entropy loss function to account for the class imbalance. Moreover, we demonstrate combinations of gDOC with various base GNN models such as GraphSAGE, Simplified Graph Convolution, and Graph Attention Networks. Lastly, our k-neighborhood time difference measure provably normalizes the temporal changes across different graph datasets. With extensive experimentation, we find that the proposed gDOC method is consistently better than a naive adaption of DOC to graphs. Specifically, in experiments using the smallest history size, the out-of-distribution detection score of gDOC is 0.09 compared to 0.01 for DOC. Furthermore, gDOC achieves an Open-F1 score, a combined measure of in-distribution classification and out-of-distribution detection, of 0.33 compared to 0.25 of DOC (32% increase).

翻译：终身图学习致力于解决图神经网络（GNN）模型持续适应演化图变化的难题。本研究聚焦终身图学习的两项关键挑战：处理新类别并应对不平衡的类别分布。这两个挑战的结合具有特殊重要性，因为新出现的类别通常仅占数据的极小比例，进一步加剧了原本偏斜的类别分布。我们做出以下贡献：首先，证明无标签数据量对结果不产生影响，这是实现任务序列上终身学习的必要前提。其次，通过实验不同标签比例，证明我们的方法仅需极少标注节点即可取得良好性能。第三，提出gDOC方法，在类别分布不平衡约束下检测新类别，其核心在于采用加权二元交叉熵损失函数以应对类别不平衡。此外，我们展示了gDOC与GraphSAGE、简化图卷积及图注意力网络等多种基础GNN模型的组合方案。最后，我们提出的k邻域时差度量可证明归一化不同图数据集的时间变化差异。通过大量实验发现，gDOC方法始终优于将DOC简单适配至图的方案。具体而言，在使用最小历史规模的实验中，gDOC的分布外检测得分为0.09，而DOC仅为0.01。此外，gDOC在分布内分类与分布外检测的综合指标Open-F1上达到0.33，较DOC的0.25提升32%。