Model predictive control (MPC) is a powerful, optimization-based approach for controlling dynamical systems. However, the computational complexity of online optimization can be problematic on embedded devices. Especially, when we need to guarantee fixed control frequencies. Thus, previous work proposed to reduce the computational burden using imitation learning (IL) approximating the MPC policy by a neural network. In this work, we instead learn the whole planned trajectory of the MPC. We introduce a combination of a novel neural network architecture PlanNetX and a simple loss function based on the state trajectory that leverages the parameterized optimal control structure of the MPC. We validate our approach in the context of autonomous driving by learning a longitudinal planner and benchmarking it extensively in the CommonRoad simulator using synthetic scenarios and scenarios derived from real data. Our experimental results show that we can learn the open-loop MPC trajectory with high accuracy while improving the closed-loop performance of the learned control policy over other baselines like behavior cloning.
翻译:模型预测控制(MPC)是一种基于优化的强大方法,用于控制动态系统。然而,在线优化的计算复杂度在嵌入式设备上可能存在问题,尤其是在需要保证固定控制频率的场景下。因此,先前的研究提出通过模仿学习(IL)用神经网络近似MPC策略来减轻计算负担。本工作中,我们转而学习MPC的完整规划轨迹。我们提出了一种新型神经网络架构PlanNetX与基于状态轨迹的简单损失函数的组合,该损失函数利用了MPC的参数化最优控制结构。我们在自动驾驶背景下验证了该方法,通过学习一个纵向规划器,并在CommonRoad仿真器中广泛使用合成场景和真实数据衍生的场景进行基准测试。实验结果表明,我们能够以高精度学习开环MPC轨迹,同时相比行为克隆等基线方法,所学控制策略的闭环性能得到提升。