Designing control policies for stabilization tasks with provable guarantees is a long-standing problem in nonlinear control. A crucial performance metric is the size of the resulting region of attraction, which essentially serves as a robustness "margin" of the closed-loop system against uncertainties. In this paper, we propose a new method to train a stabilizing neural network controller along with its corresponding Lyapunov certificate, aiming to maximize the resulting region of attraction while respecting the actuation constraints. Crucial to our approach is the use of Zubov's Partial Differential Equation (PDE), which precisely characterizes the true region of attraction of a given control policy. Our framework follows an actor-critic pattern where we alternate between improving the control policy (actor) and learning a Zubov function (critic). Finally, we compute the largest certifiable region of attraction by invoking an SMT solver after the training procedure. Our numerical experiments on several design problems show consistent and significant improvements in the size of the resulting region of attraction.
翻译:设计具有可证明保证的稳定控制策略是非线性控制领域长期存在的问题。关键性能指标是所得吸引域的大小,它本质上作为闭环系统对抗不确定性的鲁棒性"裕度"。本文提出一种新方法,用于训练稳定的神经网络控制器及其相应的李雅普诺夫证书,旨在最大化所得吸引域的同时满足执行器约束。我们方法的关键在于使用祖博夫偏微分方程,该方程精确刻画了给定控制策略的真实吸引域。我们的框架遵循Actor-Critic模式,交替改进控制策略(Actor)和学习祖博夫函数(Critic)。最后,在训练过程结束后通过调用SMT求解器计算最大可证明的吸引域。在多个设计问题上的数值实验表明,所得吸引域的大小取得了持续且显著的改进。