Model predictive control (MPC) is a powerful, optimization-based approach for controlling dynamical systems. However, the computational complexity of online optimization can be problematic on embedded devices. Especially, when we need to guarantee fixed control frequencies. Thus, previous work proposed to reduce the computational burden using imitation learning (IL) approximating the MPC policy by a neural network. In this work, we instead learn the whole planned trajectory of the MPC. We introduce a combination of a novel neural network architecture PlanNetX and a simple loss function based on the state trajectory that leverages the parameterized optimal control structure of the MPC. We validate our approach in the context of autonomous driving by learning a longitudinal planner and benchmarking it extensively in the CommonRoad simulator using synthetic scenarios and scenarios derived from real data. Our experimental results show that we can learn the open-loop MPC trajectory with high accuracy while improving the closed-loop performance of the learned control policy over other baselines like behavior cloning.
翻译:模型预测控制(MPC)是一种基于优化的强大动态系统控制方法。然而,在线优化的计算复杂度在嵌入式设备上可能成为难题,尤其是在需要保证固定控制频率时。为此,先前研究提出通过模仿学习(IL)利用神经网络逼近MPC策略以减轻计算负担。本文则创新性地学习MPC的完整规划轨迹,并提出结合新型神经网络架构PlanNetX与基于状态轨迹的简易损失函数——该函数充分利用了MPC的参数化最优控制结构。我们在自动驾驶场景中进行验证:通过学习纵向规划器,利用合成场景与真实数据衍生场景在CommonRoad模拟器中开展全面基准测试。实验结果表明,本方法能以高精度学习开环MPC轨迹,同时在学习控制策略的闭环性能上优于行为克隆等基准方法。