This paper addresses distributed stochastic optimization problems under non-i.i.d. data, focusing on the inherent trade-offs between communication and computational efficiency. To this end, we propose FlexGT, a flexible snapshot gradient tracking method that enables tunable numbers of local updates and neighbor communications per round, thereby adapting efficiently to diverse system resource conditions. Leveraging a unified convergence analysis framework, we derive tight communication and computational complexity for FlexGT with explicit dependence on objective properties and certain tunable parameters. Moreover, we introduce an accelerated variant, termed Acc-FlexGT, and prove that, with prior knowledge of the graph, it achieves Pareto-optimal trade-offs between communication and computation. Particularly, in the nonconvex case, Acc-FlexGT achieves the optimal iteration complexity of $\tilde{\mathcal{O}}\left( \left( Lσ^2 \right) /\left( nε^2 \right) +L/\left( ε\sqrt{1-\sqrt{ρ_W}} \right) \right) $ and optimal communication complexity of $\tilde{\mathcal{O}}\left( L/\left( ε\sqrt{1-\sqrt{ρ_W}} \right) \right)$ for appropriately chosen numbers of local updates, matching existing lower bounds up to logarithmic factors. And, it improves the existing results for the strongly convex case by a factor of $\tilde{\mathcal{O}} \left( 1/\sqrtε \right)$, where $ε$ is the targeted accuracy, $n$ the number of nodes, $L$ the Lipschitz constant, $ρ_W$ the connectivity of the graph, and $σ$ the stochastic gradient variance. Numerical experiments corroborate the theoretical results and demonstrate the effectiveness of the proposed methods.
翻译:本文研究非独立同分布数据下的分布式随机优化问题,聚焦通信效率与计算效率之间的内在权衡。为此,我们提出FlexGT——一种灵活的快照梯度跟踪方法,该方法允许每轮进行可调节的本地更新和邻居通信次数,从而能高效适应不同的系统资源条件。借助统一的收敛性分析框架,我们推导出FlexGT的紧致通信复杂度和计算复杂度,并明确揭示了其对目标函数特性及特定可调参数的依赖关系。进一步,我们提出一种加速变体Acc-FlexGT,并证明在已知图结构先验信息时,该方法能够实现通信与计算间的帕累托最优权衡。特别地,在非凸情形下,通过合理选择本地更新次数,Acc-FlexGT达到最优迭代复杂度$\tilde{\mathcal{O}}\left( \left( Lσ^2 \right) /\left( nε^2 \right) +L/\left( ε\sqrt{1-\sqrt{ρ_W}} \right) \right) $与最优通信复杂度$\tilde{\mathcal{O}}\left( L/\left( ε\sqrt{1-\sqrt{ρ_W}} \right) \right)$,该结果在忽略对数因子下与现有下界匹配。同时,对于强凸情形,该方法将现有结果的收敛效率提升了$\tilde{\mathcal{O}} \left( 1/\sqrtε \right)$倍。其中$ε$为目标精度,$n$为节点数,$L$为利普希茨常数,$ρ_W$为图连通性参数,$σ$为随机梯度方差。数值实验验证了理论结果,并证明了所提方法的有效性。