For obtaining optimal first-order convergence guarantee for stochastic optimization, it is necessary to use a recurrent data sampling algorithm that samples every data point with sufficient frequency. Most commonly used data sampling algorithms (e.g., i.i.d., MCMC, random reshuffling) are indeed recurrent under mild assumptions. In this work, we show that for a particular class of stochastic optimization algorithms, we do not need any other property (e.g., independence, exponential mixing, and reshuffling) than recurrence in data sampling algorithms to guarantee the optimal rate of first-order convergence. Namely, using regularized versions of Minimization by Incremental Surrogate Optimization (MISO), we show that for non-convex and possibly non-smooth objective functions, the expected optimality gap converges at an optimal rate $O(n^{-1/2})$ under general recurrent sampling schemes. Furthermore, the implied constant depends explicitly on the `speed of recurrence', measured by the expected amount of time to visit a given data point either averaged (`target time') or supremized (`hitting time') over the current location. We demonstrate theoretically and empirically that convergence can be accelerated by selecting sampling algorithms that cover the data set most effectively. We discuss applications of our general framework to decentralized optimization and distributed non-negative matrix factorization.
翻译:为了在随机优化中获得最优的一阶收敛保证,必须使用能以足够频率对每个数据点进行采样的循环数据采样算法。大多数常用的数据采样算法(如独立同分布采样、马尔可夫链蒙特卡洛、随机重排)在温和假设下确实是循环的。本文表明,对于特定类别的随机优化算法,我们无需数据采样算法具备除循环性之外的任何其他性质(如独立性、指数混合和重排),即可保证一阶收敛的最优速率。具体而言,通过使用增量替代优化极小化(MISO)的正则化版本,我们证明对于非凸且可能非光滑的目标函数,在一般循环采样方案下,期望最优性差距以最优速率$O(n^{-1/2})$收敛。此外,隐含常数显式依赖于“循环速度”,该速度通过访问给定数据点的期望时间(根据当前位置取平均的“目标时间”或取上确界的“击中时间”)衡量。我们从理论和实验上证明,通过选择能最有效覆盖数据集的采样算法可以加速收敛。我们讨论了该通用框架在分散优化和分布式非负矩阵分解中的应用。