Communication overhead is a critical challenge in federated learning, particularly in bandwidth-constrained networks. Although many methods have been proposed to reduce communication overhead, most focus solely on compressing individual gradients, overlooking the temporal correlations among them. Prior studies have shown that gradients exhibit spatial correlations, typically reflected in low-rank structures. Through empirical analysis, we further observe a strong temporal correlation between client gradients across adjacent rounds. Based on these observations, we propose GradESTC, a compression technique that exploits both spatial and temporal gradient correlations. GradESTC exploits spatial correlations to decompose each full gradient into a compact set of basis vectors and corresponding combination coefficients. By exploiting temporal correlations, only a small portion of the basis vectors need to be dynamically updated in each round. GradESTC significantly reduces communication overhead by transmitting lightweight combination coefficients and a limited number of updated basis vectors instead of the full gradients. Extensive experiments show that, upon reaching a target accuracy level near convergence, GradESTC reduces uplink communication by an average of 39.79% compared to the strongest baseline, while maintaining comparable convergence speed and final accuracy to uncompressed FedAvg. By effectively leveraging spatio-temporal gradient structures, GradESTC offers a practical and scalable solution for communication-efficient federated learning.
翻译:通信开销是联邦学习中的关键挑战,尤其在带宽受限的网络中。尽管已有多种方法被提出以降低通信开销,但大多仅关注压缩单个梯度,忽略了梯度间的时间相关性。先前研究表明梯度存在空间相关性,通常表现为低秩结构。通过实证分析,我们进一步观察到客户端梯度在相邻轮次间存在显著的时间相关性。基于这些观察,我们提出GradESTC——一种同时利用梯度空间与时间相关性的压缩技术。GradESTC利用空间相关性将完整梯度分解为紧凑的基向量集与对应组合系数;通过利用时间相关性,每轮仅需动态更新少量基向量。该方法通过传输轻量级组合系数与有限数量的更新基向量(而非完整梯度),显著降低了通信开销。大量实验表明,在达到接近收敛的目标精度时,GradESTC相比最强基线平均减少39.79%的上行通信量,同时保持与未压缩FedAvg相当的收敛速度与最终精度。通过有效利用梯度时空结构,GradESTC为通信高效的联邦学习提供了实用且可扩展的解决方案。