Optimal control (OC) is an effective approach to controlling complex dynamical systems. However, typical approaches to parameterising and learning controllers in optimal control have been ad-hoc, collecting data and fitting it to neural networks. This two-step approach can overlook crucial constraints such as optimality and time-varying conditions. We introduce a unified, function-first framework that simultaneously learns Lyapunov or value functions while implicitly solving OC problems. We propose two mathematical programs based on the Hamilton-Jacobi-Bellman (HJB) constraint and its relaxation to learn time varying value and Lyapunov functions. We show the effectiveness of our approach on linear and nonlinear control-affine problems. The proposed methods are able to generate near optimal trajectories and guarantee Lyapunov condition over a compact set of initial conditions. Furthermore We compare our methods to Soft Actor Critic (SAC) and Proximal Policy Optimisation (PPO). In this comparison, we never underperform in task cost and, in the best cases, outperform SAC and PPO by a factor of 73 and 22, respectively.
翻译:最优控制(OC)是控制复杂动力系统的有效方法。然而,最优控制中典型的学习控制器参数化方法往往是临时性的,即先收集数据再将其拟合到神经网络。这种两步法可能忽略关键约束,如最优性和时变条件。我们提出一个统一的、以函数为先的框架,该框架在学习Lyapunov函数或价值函数的同时隐式求解OC问题。我们提出两个基于Hamilton-Jacobi-Bellman(HJB)约束及其松弛形式的数学规划,用于学习时变价值函数和Lyapunov函数。我们在线性和非线性控制仿射问题上展示了该方法的效果。所提出的方法能够生成接近最优的轨迹,并在初始条件的紧致集上保证Lyapunov条件。此外,我们将方法与Soft Actor-Critic(SAC)和Proximal Policy Optimization(PPO)进行了比较。在此比较中,我们在任务成本上从未低于这些方法,并且在最佳情况下,表现分别优于SAC和PPO达73倍和22倍。