Optimal control (OC) is an effective approach to controlling complex dynamical systems. However, traditional approaches to parameterising and learning controllers in optimal control have been ad-hoc, collecting data and fitting it to neural networks. However, this can lead to learnt controllers ignoring constraints like optimality and time variability. We introduce a unified framework that simultaneously solves control problems while learning corresponding Lyapunov or value functions. Our method formulates OC-like mathematical programs based on the Hamilton-Jacobi-Bellman (HJB) equation. We leverage the HJB optimality constraint and its relaxation to learn time-varying value and Lyapunov functions, implicitly ensuring the inclusion of constraints. We show the effectiveness of our approach on linear and nonlinear control-affine problems. Additionally, we demonstrate significant reductions in planning horizons (up to a factor of 25) when incorporating the learnt functions into Model Predictive Controllers.
翻译:最优控制(OC)是控制复杂动态系统的有效方法。然而,传统的最优控制中参数化与学习控制器的方法具有临时性,通常通过收集数据并将其拟合至神经网络实现。但这可能导致学习到的控制器忽略最优性、时间变异性等约束。我们提出一个统一框架,可同时求解控制问题并学习相应的李雅普诺夫函数或价值函数。该方法基于哈密顿-雅可比-贝尔曼(HJB)方程构建类最优控制数学规划。我们利用HJB最优性约束及其松弛形式,学习时变价值函数与李雅普诺夫函数,从而隐式确保约束的纳入。在线性与非线性控制仿射问题中,我们验证了该方法的有效性。此外,在将学习到的函数融入模型预测控制器后,我们展示了规划时域的大幅缩减(最高可达25倍)。