As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.
翻译:随着大型语言模型(LLM)向自主多智能体生态系统演进,鲁棒的极小极大训练变得至关重要,但当高度非线性的策略在内部最大化过程中引发极端局部曲率时,该方法仍易出现不稳定性。强制全局雅可比界限的标准修正方法过于保守,会抑制所有方向上的敏感性,并导致巨大的鲁棒性代价。本文提出对抗对齐雅可比正则化(AAJR),这是一种轨迹对齐方法,可严格沿对抗上升方向控制敏感性。我们证明,在温和条件下,AAJR产生的允许策略类严格大于全局约束下的策略类,这意味着近似间隙弱减小且名义性能退化降低。此外,我们推导了步长条件,在该条件下AAJR能沿优化轨迹控制有效平滑度并确保内循环稳定性。这些结果为智能体鲁棒性提供了一种结构理论,将极小极大稳定性与全局表达能力限制解耦。