Model Predictive Control (MPC) is attracting tremendous attention in the autonomous driving task as a powerful control technique. The success of an MPC controller strongly depends on an accurate internal dynamics model. However, the static parameters, usually learned by system identification, often fail to adapt to both internal and external perturbations in real-world scenarios. In this paper, we firstly (1) reformulate the problem as a Partially Observed Markov Decision Process (POMDP) that absorbs the uncertainties into observations and maintains Markov property into hidden states; and (2) learn a recurrent policy continually adapting the parameters of the dynamics model via Recurrent Reinforcement Learning (RRL) for optimal and adaptive control; and (3) finally evaluate the proposed algorithm (referred as $\textit{MPC-RRL}$) in CARLA simulator and leading to robust behaviours under a wide range of perturbations.
翻译:模型预测控制(MPC)作为一种强大的控制技术,在自动驾驶任务中备受关注。MPC控制器的成功高度依赖于精确的内部动力学模型。然而,通常通过系统辨识学习到的静态参数,在真实场景中往往无法适应内外部的扰动。本文中,我们首先(1)将问题重新表述为部分可观测马尔可夫决策过程(POMDP),将不确定性吸收到观测中,并在隐藏状态中保持马尔可夫性质;(2)通过循环强化学习(RRL)学习一个循环策略,持续调整动力学模型的参数,以实现最优和自适应控制;(3)最后在CARLA模拟器中评估所提出的算法(记为$\textit{MPC-RRL}$),并在广泛扰动下展现出鲁棒的行为。