We propose two variants of the Primal Dual Hybrid Gradient (PDHG) algorithm for saddle point problems with block decomposable duals, hereafter called Multi-Timescale PDHG (MT-PDHG) and its accelerated variant (AMT-PDHG). Through novel mixtures of Bregman divergence and multi-timescale extrapolations, our MT-PDHG and AMT-PDHG converge under arbitrary updating rates for different dual blocks while remaining fully deterministic and robust to extreme delays in dual updates. We further apply our (A)MT-PDHG, augmented with the gradient sliding techniques introduced in Lan et al. (2020), Lan (2016), to distributed optimization. The flexibility in choosing different updating rates for different blocks allows a more refined control over the communication rounds between different pairs of agents, thereby improving the efficiencies in settings with heterogeneity in local objectives and communication costs. Moreover, with careful choices of penalty levels, our algorithms show linear and thus optimal dependency on function similarities, a measure of how similar the gradients of local objectives are. This provides a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).
翻译:本文针对具有块可分解对偶结构的鞍点问题,提出了原对偶混合梯度(PDHG)算法的两种变体——多时间尺度PDHG(MT-PDHG)及其加速变体(AMT-PDHG)。通过创新性地融合Bregman散度与多时间尺度外推技术,我们的MT-PDHG和AMT-PDHG在不同对偶块采用任意更新速率时仍能收敛,同时保持完全确定性并对极端对偶更新延迟具有稳健性。我们进一步将(A)MT-PDHG与Lan等(2020)、Lan(2016)提出的梯度滑动技术相结合,应用于分布式优化问题。通过灵活为不同区块选择不同更新速率,该方法能更精细地控制不同智能体对之间的通信轮次,从而在局部目标与通信成本存在异质性的场景中提升效率。此外,通过精心选择惩罚参数,我们的算法展示了函数相似度——衡量局部目标梯度相似程度的指标——的线性依赖关系(即最优依赖关系)。这为非光滑目标函数能否实现此类依赖关系这一开放性问题给出了肯定回答(Arjevani和Shamir 2015)。