We analyze two classical algorithms for solving additively composite convex optimization problems where the objective is the sum of a smooth term and a nonsmooth regularizer: proximal stochastic gradient method for a single regularizer; and the randomized incremental proximal method, which uses the proximal operator of a randomly selected function when the regularizer is given as the sum of many nonsmooth functions. We focus on relaxing the bounded variance assumption that is common, yet stringent, for getting last iterate convergence rates. We prove the $\widetilde{O}(1/\sqrt{T})$ rate of convergence for the last iterate of both algorithms under componentwise convexity and smoothness, which is optimal up to log terms. Our results apply directly to graph-guided regularizers that arise in multi-task and federated learning, where the regularizer decomposes as a sum over edges of a collaboration graph.
翻译:本文针对加性复合凸优化问题分析了两类经典算法,其中目标函数由光滑项与非光滑正则项之和构成:针对单一正则项的随机近端梯度法;以及当正则项表示为多个非光滑函数之和时采用的随机增量近端方法,该方法通过随机选择函数的近端算子进行迭代。我们重点研究了在放宽有界方差假设条件下获取末次迭代收敛速率的理论,该假设在现有研究中普遍存在但约束性较强。在分量凸性与光滑性条件下,我们证明了两类算法末次迭代均具有$\widetilde{O}(1/\sqrt{T})$的收敛速率,该速率除对数项外已达理论最优。我们的研究成果可直接应用于多任务学习与联邦学习中出现的图引导正则项,此类正则项可分解为协作图边上的求和形式。