We study first-order methods with preconditioning for solving structured nonlinear convex optimization problems. We propose a new family of preconditioners generated by symmetric polynomials. They provide first-order optimization methods with a provable improvement of the condition number, cutting the gaps between highest eigenvalues, without explicit knowledge of the actual spectrum. We give a stochastic interpretation of this preconditioning in terms of coordinate volume sampling and compare it with other classical approaches, including the Chebyshev polynomials. We show how to incorporate a polynomial preconditioning into the Gradient and Fast Gradient Methods and establish the corresponding global complexity bounds. Finally, we propose a simple adaptive search procedure that automatically chooses the best possible polynomial preconditioning for the Gradient Method, minimizing the objective along a low-dimensional Krylov subspace. Numerical experiments confirm the efficiency of our preconditioning strategies for solving various machine learning problems.
翻译:我们研究用于求解结构化非线性凸优化问题的带预处理一阶方法。提出一类由对称多项式生成的新型预处理器。这些预处理器能够在不获取实际谱信息的前提下,通过切断最高特征值之间的间隙来显著改善条件数,为一阶优化方法提供可证明的性能提升。我们从坐标体积采样的角度给出该预处理方法的随机解释,并将其与包括切比雪夫多项式在内的经典方法进行比较。展示如何将多项式预处理融入梯度法和快速梯度法,并建立相应的全局复杂度界。最后提出一种简单的自适应搜索过程,该方法能沿低维Krylov子空间自动为梯度法选取最优多项式预处理器以最小化目标函数。数值实验验证了所提预处理策略在求解各类机器学习问题时的有效性。