Stochastic majorization-minimization (SMM) is a class of stochastic optimization algorithms that proceed by sampling new data points and minimizing a recursive average of surrogate functions of an objective function. The surrogates are required to be strongly convex and convergence rate analysis for the general non-convex setting was not available. In this paper, we propose an extension of SMM where surrogates are allowed to be only weakly convex or block multi-convex, and the averaged surrogates are approximately minimized with proximal regularization or block-minimized within diminishing radii, respectively. For the general nonconvex constrained setting with non-i.i.d. data samples, we show that the first-order optimality gap of the proposed algorithm decays at the rate $O((\log n)^{1+\epsilon}/n^{1/2})$ for the empirical loss and $O((\log n)^{1+\epsilon}/n^{1/4})$ for the expected loss, where $n$ denotes the number of data samples processed. Under some additional assumption, the latter convergence rate can be improved to $O((\log n)^{1+\epsilon}/n^{1/2})$. As a corollary, we obtain the first convergence rate bounds for various optimization methods under general nonconvex dependent data setting: Double-averaging projected gradient descent and its generalizations, proximal point empirical risk minimization, and online matrix/tensor decomposition algorithms. We also provide experimental validation of our results.
翻译:随机主要化-最小化(SMM)是一类随机优化算法,通过采样新数据点并最小化目标函数代理函数的递归平均值进行迭代。代理函数需满足强凸性,且此前缺乏对一般非凸场景的收敛速率分析。本文提出SMM的扩展方法,允许代理函数仅为弱凸或块多凸,并分别通过近端正则化或递减半径内的块最小化来近似最小化平均代理函数。针对非独立同分布数据采样的一般非凸约束场景,我们证明所提算法在经验损失上的最优性一阶间隙以$O((\log n)^{1+\epsilon}/n^{1/2})$速率衰减,在期望损失上以$O((\log n)^{1+\epsilon}/n^{1/4})$速率衰减(其中$n$为处理的数据样本数)。在额外假设下,后者收敛率可提升至$O((\log n)^{1+\epsilon}/n^{1/2})$。作为推论,我们首次获得一般非凸依赖数据场景下多种优化方法的收敛率界:双平均投影梯度下降及其推广、近端点经验风险最小化、在线矩阵/张量分解算法。同时,我们通过实验验证了理论结果。