Federated learning protocols require repeated synchronization between clients and a central server, with convergence rates depending on learning rates, data heterogeneity, and client sampling. This paper asks whether iterative communication is necessary for distributed linear regression. We show it is not. We formulate federated ridge regression as a distributed equilibrium problem where each client computes local sufficient statistics -- the Gram matrix and moment vector -- and transmits them once. The server reconstructs the global solution through a single matrix inversion. We prove exact recovery: under a coverage condition on client feature matrices, one-shot aggregation yields the centralized ridge solution, not an approximation. For heterogeneous distributions violating coverage, we derive non-asymptotic error bounds depending on spectral properties of the aggregated Gram matrix. Communication reduces from $\mathcal{O}(Rd)$ in iterative methods to $\mathcal{O}(d^2)$ total; for high-dimensional settings, we propose and experimentally validate random projection techniques reducing this to $\mathcal{O}(m^2)$ where $m \ll d$. We establish differential privacy guarantees where noise is injected once per client, eliminating the composition penalty that degrades privacy in multi-round protocols. We further address practical considerations including client dropout robustness, federated cross-validation for hyperparameter selection, and comparison with gradient-based alternatives. Comprehensive experiments on synthetic heterogeneous regression demonstrate that one-shot fusion matches FedAvg accuracy while requiring up to $38\times$ less communication. The framework applies to kernel methods and random feature models but not to general nonlinear architectures.
翻译:联邦学习协议通常需要在客户端与中央服务器之间进行重复同步,其收敛速率取决于学习率、数据异质性及客户端采样策略。本文探讨迭代通信对于分布式线性回归是否必要。我们证明其并非必需。我们将联邦岭回归构建为分布式均衡问题,其中每个客户端计算本地充分统计量——格拉姆矩阵与矩向量——并单次传输至服务器。服务器通过单次矩阵求逆重构全局解。我们证明了精确恢复特性:在客户端特征矩阵满足覆盖条件的情况下,单次聚合即可得到集中式岭回归解,而非近似解。对于违反覆盖条件的异质分布,我们推导了依赖于聚合格拉姆矩阵谱性质的非渐近误差界。通信开销从迭代方法的$\mathcal{O}(Rd)$降低至总计$\mathcal{O}(d^2)$;针对高维场景,我们提出并实验验证了随机投影技术,可将开销进一步降至$\mathcal{O}(m^2)$,其中$m \ll d$。我们建立了差分隐私保障机制,其中每个客户端仅需单次注入噪声,消除了多轮协议中因组合效应导致的隐私衰减问题。我们进一步探讨了实际应用中的关键问题,包括客户端掉线鲁棒性、超参数选择的联邦交叉验证,以及与基于梯度方法的对比分析。在合成异质回归数据集上的综合实验表明,单次融合能达到与FedAvg相当的精度,同时减少高达$38\times$的通信量。该框架适用于核方法与随机特征模型,但不适用于通用非线性架构。