Learning Predictive Control with Deep Koopman Operators for Autonomous Vehicle Motion Planning

Model Predictive Control (MPC) is widely used for autonomous-vehicle (AV) motion planning, but its real-time applicability is often limited by the need for accurate models and online solution of nonlinear, nonconvex optimization problems in dynamic road environments. Actor-critic reinforcement learning offers a promising alternative for online policy generation, yet its policy-learning process often lacks explicit control-theoretic structure. This article proposes a learning predictive control (LPC) framework with deep Koopman operators for efficient real-time motion planning under nonconvex constraints. To address nonlinear and uncertain vehicle dynamics, a deep-Koopman-based predictor is used to lift the system into an interpretable linear observable space in a data-driven manner. Unlike traditional MPC, which computes open-loop control sequences, the proposed LPC framework yields a closed-loop state-feedback policy within each prediction interval through receding-horizon actor-critic learning. To ensure safety under nonconvex environmental constraints, LPC constructs convex local surrogate representations of obstacles and defines corresponding potential-field functions. These functions and their gradients are directly embedded into the actor-critic structure, enabling efficient, safety-aware policy learning. Extensive simulations and real-world experiments on the HongQi-EHS3 platform demonstrate favorable performance in diverse obstacle-avoidance scenarios in terms of safety, computational efficiency, and driving comfort, compared with benchmark methods such as CBF-MPC and LMPCC.

翻译：模型预测控制（MPC）被广泛应用于自动驾驶车辆（AV）的运动规划，但其在实际场景中的实时应用常受限于对精确模型的需求以及在动态道路环境中非线性、非凸优化问题的在线求解。演员-评论家强化学习为在线策略生成提供了一种有前景的替代方案，但其策略学习过程往往缺乏明确的控制理论结构。本文提出了一种结合深度库普曼算子的学习预测控制（LPC）框架，用于在非凸约束下实现高效的实时运动规划。为处理非线性和不确定的车辆动力学，本文采用基于深度库普曼的预测器，以数据驱动的方式将系统提升至可解释的线性可观测空间。与传统MPC计算开环控制序列不同，所提出的LPC框架通过滚动时域演员-评论家学习，在每个预测区间内生成闭环状态反馈策略。为确保非凸环境约束下的安全性，LPC构建了障碍物的凸局部替代表征，并定义了相应的势场函数。这些函数及其梯度被直接嵌入演员-评论家结构中，从而实现高效且具有安全意识的策略学习。在红旗EHS3平台上进行的广泛仿真和实车实验表明，与CBF-MPC和LMPCC等基准方法相比，所提方法在多种避障场景下的安全性、计算效率和驾驶舒适性方面均表现出优越性能。