Distributed Graph Neural Network (GNN) training suffers from substantial communication overhead due to the inherent neighborhood dependency in graph-structured data. This neighbor explosion problem requires workers to frequently exchange boundary node features across partitions, creating a communication bottleneck that severely limits training scalability. Existing approaches rely on static graph partitioning strategies that cannot adapt to dynamic network conditions. In this paper, we propose CondenseGraph, a novel communication-efficient framework for distributed GNN training. Our key innovation is an on-the-fly graph condensation mechanism that dynamically compresses boundary node features into compact super nodes before transmission. To compensate for the information loss introduced by compression, we develop a gradient-based error feedback mechanism that maintains convergence guarantees while reducing communication volume by 40-60%. Extensive experiments on four benchmark datasets demonstrate that CondenseGraph achieves comparable accuracy to full-precision baselines while significantly reducing communication costs and training time.
翻译:分布式图神经网络(GNN)训练因图结构数据固有的邻域依赖性而面临巨大的通信开销。邻域爆炸问题要求工作节点频繁跨分区交换边界节点特征,形成通信瓶颈,严重制约了训练的可扩展性。现有方法依赖于静态图划分策略,无法适应动态网络条件。本文提出CondenseGraph,一种新型的通信高效分布式GNN训练框架。其核心创新在于一种即时图压缩机制,可在传输前将边界节点特征动态压缩为紧凑的超节点。为弥补压缩带来的信息损失,我们开发了一种基于梯度的误差反馈机制,在保证收敛性的同时将通信量降低40-60%。在四个基准数据集上的大量实验表明,CondenseGraph在实现与全精度基线相当精度的同时,显著降低了通信成本与训练时间。