While ensuring stability for linear systems is well understood, it remains a major challenge for systems with nonlinear dynamics. A general approach in such cases is to leverage Lyapunov stability theory to compute a combination of a Lyapunov control function and an associated control policy. However, finding Lyapunov functions for general nonlinear systems is a challenging task. To address this challenge, several methods have been recently proposed that represent Lyapunov functions using neural networks. However, such approaches have been designed exclusively for continuous-time systems. We propose the first approach for learning neural Lyapunov control in discrete-time systems. Three key ingredients enable us to effectively learn provably stable control policies. The first is a novel mixed-integer linear programming approach for verifying the stability conditions in discrete-time systems. The second is a novel approach for computing sub-level sets which characterize the region of attraction. Finally, we rely on a heuristic gradient-based approach for quickly finding counterexamples to significantly speed up Lyapunov function learning. Our experiments on four standard benchmarks demonstrate that our approach significantly outperforms state-of-the-art baselines. For example, on the path tracking benchmark, we outperform recent neural Lyapunov control baselines by an order of magnitude in both running time and the size of the region of attraction, and on two of the four benchmarks (cartpole and PVTOL), ours is the first automated approach to return a provably stable controller.
翻译:尽管线性系统的稳定性保障已得到充分理解,但对于具有非线性动力学的系统而言,这仍是一大挑战。针对这类情况的通用方法是利用李雅普诺夫稳定性理论,来计算李雅普诺夫控制函数与相应控制策略的组合。然而,为一般非线性系统寻找李雅普诺夫函数是一项艰巨任务。为解决这一挑战,近期提出了若干方法,利用神经网络来表示李雅普诺夫函数。但这些方法仅针对连续时间系统设计。我们提出了首个在离散时间系统中学习神经李雅普诺夫控制的方法。三个关键要素使我们能够有效学习可证明稳定的控制策略。第一个是用于验证离散时间系统稳定性条件的新型混合整数线性规划方法。第二个是用于计算刻画吸引域的子水平集的新型方法。最后,我们依赖基于启发式梯度的快速寻找反例方法,以显著加速李雅普诺夫函数学习。在四个标准基准上的实验表明,我们的方法显著优于最先进基线。例如,在路径跟踪基准上,我们在运行时间和吸引域大小上均比近期神经李雅普诺夫控制基线提升一个数量级;在四个基准中的两个(cartpole和PVTOL)上,我们是首个返回可证明稳定控制器的自动化方法。