Enhancing Imbalanced Node Classification via Curriculum-Guided Feature Learning and Three-Stage Attention Network

Imbalanced node classification in graph neural networks (GNNs) happens when some labels are much more common than others, which causes the model to learn unfairly and perform badly on the less common classes. To solve this problem, we propose a Curriculum-Guided Feature Learning and Three-Stage Attention Network (CL3AN-GNN), a learning network that uses a three-step attention system (Engage, Enact, Embed) similar to how humans learn. The model begins by engaging with structurally simpler features, defined as (1) local neighbourhood patterns (1-hop), (2) low-degree node attributes, and (3) class-separable node pairs identified via initial graph convolutional networks and graph attention networks (GCN and GAT) embeddings. This foundation enables stable early learning despite label skew. The Enact stage then addresses complicated aspects: (1) connections that require multiple steps, (2) edges that connect different types of nodes, and (3) nodes at the edges of minority classes by using adjustable attention weights. Finally, Embed consolidates these features via iterative message passing and curriculum-aligned loss weighting. We evaluate CL3AN-GNN on eight Open Graph Benchmark datasets spanning social, biological, and citation networks. Experiments show consistent improvements across all datasets in accuracy, F1-score, and AUC over recent state-of-the-art methods. The model's step-by-step method works well with different types of graph datasets, showing quicker results than training everything at once, better performance on new, imbalanced graphs, and clear explanations of each step using gradient stability and attention correlation learning curves. This work provides both a theoretically grounded framework for curriculum learning in GNNs and practical evidence of its effectiveness against imbalances, validated through metrics, convergence speeds, and generalisation tests.

翻译：图神经网络中的非均衡节点分类问题出现在某些标签远比其他标签更常见时，这会导致模型学习不公平且在稀有类别上表现不佳。为解决此问题，我们提出一种课程引导的特征学习与三阶段注意力网络（CL3AN-GNN），该学习网络采用类似人类学习过程的三步注意力系统（Engage、Enact、Embed）。模型首先从结构简单的特征入手，这些特征定义为：（1）局部邻域模式（1跳），（2）低度节点属性，以及（3）通过初始图卷积网络和图注意力网络（GCN与GAT）嵌入识别的类别可分节点对。这一基础确保了在标签偏斜情况下仍能实现稳定的早期学习。随后，Enact阶段通过可调节注意力权重处理复杂特征：（1）需要多步连接的路径，（2）连接异质节点的边，以及（3）位于少数类边界的节点。最后，Embed阶段通过迭代消息传递和课程对齐的损失加权整合这些特征。我们在涵盖社交、生物和引文网络的八个Open Graph Benchmark数据集上评估CL3AN-GNN。实验表明，相较于当前最先进方法，该模型在所有数据集的准确率、F1分数和AUC指标上均取得稳定提升。这种渐进式方法能有效适配不同类型的图数据集，其优势体现在：相比整体训练获得更快的收敛速度、在新出现的非均衡图上表现更优，并能通过梯度稳定性与注意力相关性学习曲线清晰解释各阶段学习过程。本研究不仅为图神经网络中的课程学习提供了理论框架，还通过性能指标、收敛速度与泛化测试验证了其在应对非均衡问题上的实际有效性。