For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.
翻译:对于各类机器学习任务中遇到的极小极大优化和变分不等式问题(VIP),随机外梯度法(SEG)与随机梯度下降上升法(SGDA)已成为主流算法。常步长版本的SEG/SGDA因易于调参且对初始条件具有快速遗忘性等优势而广受欢迎,但即使是在基础双线性模型中,其收敛行为也更为复杂。本研究致力于阐明并量化这些算法内在的概率结构。通过将常步长SEG/SGDA重构为时间齐次马尔可夫链,我们首次建立了大数定律与中心极限定理,证明在广泛单调与非单调VIP中,平均迭代量渐进服从正态分布且具有唯一不变分布。针对凸-凹极小极大优化,我们刻画了步长与冯·诺依曼值诱导偏差之间的关系。最后,我们证明Richardson-Romberg外推法可改善VIP平均迭代量向全局解的逼近精度。本概率分析结合实验验证了理论发现,其技术框架融合了优化理论、马尔可夫链与算子理论。