Adaptive gradient methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner, and empirically achieve faster convergence for solving minimization problems. When it comes to nonconvex minimax optimization, however, current convergence analyses of gradient descent ascent (GDA) combined with adaptive stepsizes require careful tuning of hyper-parameters and the knowledge of problem-dependent parameters. Such a discrepancy arises from the primal-dual nature of minimax problems and the necessity of delicate time-scale separation between the primal and dual updates in attaining convergence. In this work, we propose a single-loop adaptive GDA algorithm called TiAda for nonconvex minimax optimization that automatically adapts to the time-scale separation. Our algorithm is fully parameter-agnostic and can achieve near-optimal complexities simultaneously in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems. The effectiveness of the proposed method is further justified numerically for a number of machine learning applications.
翻译:自适应梯度方法已展现出能够以参数无关的方式动态调整步长,并在经验上为求解最小化问题实现更快的收敛。然而,当涉及非凸极小极大优化时,当前结合自适应步长的梯度下降上升(GDA)收敛分析需要仔细调整超参数并依赖于问题相关参数的知识。这种差异源于极小极大问题的原始-对偶本质,以及在实现收敛时原始更新与对偶更新之间精细时间尺度分离的必要性。本文中,我们提出了一种名为TiAda的单循环自适应GDA算法,用于非凸极小极大优化,该算法能自动适应时间尺度分离。我们的算法完全与参数无关,并能在非凸-强凹极小极大问题的确定性和随机性设置中同时达到近乎最优的复杂度。所提方法的有效性在多个机器学习应用中得到了进一步的数值验证。