In the context of long-tail classification on graphs, the vast majority of existing work primarily revolves around the development of model debiasing strategies, intending to mitigate class imbalances and enhance the overall performance. Despite the notable success, there is very limited literature that provides a theoretical tool for characterizing the behaviors of long-tail classes in graphs and gaining insight into generalization performance in real-world scenarios. To bridge this gap, we propose a generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular class. Our theoretical results show that the generalization performance of long-tail classification is dominated by the overall loss range and the task complexity. Building upon the theoretical findings, we propose a novel generic framework HierTail for long-tail classification on graphs. In particular, we start with a hierarchical task grouping module that allows us to assign related tasks into hypertasks and thus control the complexity of the task space; then, we further design a balanced contrastive learning module to adaptively balance the gradients of both head and tail classes to control the loss range across all tasks in a unified fashion. Extensive experiments demonstrate the effectiveness of HierTail in characterizing long-tail classes on real graphs, which achieves up to 12.9% improvement over the leading baseline method in accuracy.
翻译:在图数据的长尾分类研究背景下,绝大多数现有工作主要围绕模型去偏策略的开发展开,旨在缓解类别不平衡并提升整体性能。尽管取得了显著成功,目前仍鲜有文献提供理论工具来刻画图中长尾类别的行为特征,并深入理解实际场景中的泛化性能。为填补这一空白,我们通过将问题构建为多任务学习形式(即每个任务对应特定类别的预测),提出了图长尾分类的泛化界。我们的理论结果表明,长尾分类的泛化性能主要由总体损失范围与任务复杂度所主导。基于理论发现,我们提出了一种新颖的通用框架 HierTail 用于图长尾分类。具体而言,我们首先设计分层任务分组模块,将相关任务分配至超任务中以控制任务空间的复杂度;随后,进一步设计平衡对比学习模块,以自适应地平衡头部与尾部类别的梯度,从而以统一方式控制所有任务的损失范围。大量实验证明了 HierTail 在真实图数据上表征长尾类别的有效性,其在准确率上相比领先基线方法最高可提升12.9%。