Constrained optimization is a powerful framework for enforcing requirements on neural networks. These constrained deep learning problems are typically solved using first-order methods on their min-max Lagrangian formulation, but such approaches often suffer from oscillations and can fail to find all local solutions. While the Augmented Lagrangian method (ALM) addresses these issues, practitioners often favor dual optimistic ascent schemes (PI control) on the standard Lagrangian, which perform well empirically but lack formal guarantees. In this paper, we establish a previously unknown equivalence between these approaches: dual optimistic ascent on the Lagrangian is equivalent to gradient descent-ascent on the Augmented Lagrangian. This finding allows us to transfer the robust theoretical guarantees of the ALM to the dual optimistic setting, proving it converges linearly to all local solutions. Furthermore, the equivalence provides principled guidance for tuning the optimism hyper-parameter. Our work closes a critical gap between the empirical success of dual optimistic methods and their theoretical foundation in the single-step, first-order regime commonly used in constrained deep learning.
翻译:约束优化是强化神经网络需求的有力框架。这些约束深度学习问题通常通过对其最小-最大拉格朗日形式应用一阶方法求解,但此类方法常出现振荡,且可能无法找到所有局部解。虽然增广拉格朗日法(ALM)能解决这些问题,实践者却更青睐在标准拉格朗日函数上采用双重乐观上升方案(PI控制),因其经验表现良好但缺乏形式化保证。本文建立了这两种方法间一个先前未知的等价关系:拉格朗日函数上的双重乐观上升等价于增广拉格朗日函数上的梯度下降-上升。这一发现使得我们可以将ALM的鲁棒理论保证迁移至双重乐观设定,证明其能以线性速率收敛至所有局部解。此外,该等价关系为调节乐观超参数提供了原则性指导。我们的研究弥合了双重乐观方法在经验上的成功与其在约束深度学习常用单步一阶机制中的理论基础之间的关键鸿沟。