We study a class of nonconvex nonsmooth optimization problems in which the objective is a sum of two functions: One function is the average of a large number of differentiable functions, while the other function is proper, lower semicontinuous and has a surrogate function that satisfies standard assumptions. Such problems arise in machine learning and regularized empirical risk minimization applications. However, nonconvexity and the large-sum structure are challenging for the design of new algorithms. Consequently, effective algorithms for such scenarios are scarce. We introduce and study three stochastic variance-reduced majorization-minimization (MM) algorithms, combining the general MM principle with new variance-reduced techniques. We provide almost surely subsequential convergence of the generated sequence to a stationary point. We further show that our algorithms possess the best-known complexity bounds in terms of gradient evaluations. We demonstrate the effectiveness of our algorithms on sparse binary classification problems, sparse multi-class logistic regressions, and neural networks by employing several widely-used and publicly available data sets.
翻译:我们研究一类非凸非光滑优化问题,其目标函数由两个函数之和构成:一个函数是大量可微函数的平均值,另一个函数是真下半连续且具有满足标准假设的替代函数。此类问题常见于机器学习和正则化经验风险最小化应用中。然而,非凸性和大规模求和结构给新算法设计带来挑战,导致针对此类场景的有效算法较为稀缺。我们提出并研究了三种随机方差缩减主极小化(MM)算法,将通用MM原理与新型方差缩减技术相结合。我们证明生成序列几乎必然子序列收敛到稳定点,并进一步表明这些算法在梯度评估方面具有目前已知的最优复杂度界限。通过使用多个广泛公开的数据集,我们在稀疏二分类问题、稀疏多类逻辑回归及神经网络中验证了算法的有效性。