This study presents an Actor-Critic reinforcement learning Compensated Model Predictive Controller (AC2MPC) designed for high-speed, off-road autonomous driving on deformable terrains. Addressing the difficulty of modeling unknown tire-terrain interaction and ensuring real-time control feasibility and performance, this framework integrates deep reinforcement learning with a model predictive controller to manage unmodeled nonlinear dynamics. We evaluate the controller framework over constant and varying velocity profiles using high-fidelity simulator Project Chrono. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers over three unknown terrains that represent sandy deformable track, sandy and rocky track and cohesive clay-like deformable soil track. Despite varied and previously unseen terrain characteristics, this framework generalized well enough to track longitudinal reference speeds with the least error. Furthermore, this framework required significantly less training data compared to purely learning based controller, converging in fewer steps while delivering better performance. Even when under-trained, this controller outperformed the standalone controllers, highlighting its potential for safer and more efficient real-world deployment.
翻译:本研究提出了一种基于Actor-Critic强化学习的补偿模型预测控制器(AC2MPC),专为可变形地形上的高速越野自动驾驶设计。针对未知轮胎-地形相互作用建模困难以及实时控制可行性与性能保障的挑战,该框架将深度强化学习与模型预测控制器相结合,以处理未建模的非线性动力学。我们使用高保真仿真器Project Chrono,在恒定与变化的速度曲线下对该控制器框架进行评估。实验结果表明,在代表沙质可变形路径、沙石混合路径以及黏性类黏土可变形土壤路径的三种未知地形上,我们的控制器在统计意义上优于独立的基于模型的控制器与基于学习的控制器。尽管面对多样且先前未见的地形特征,该框架仍展现出良好的泛化能力,能以最小误差跟踪纵向参考速度。此外,与纯学习型控制器相比,该框架所需的训练数据显著减少,收敛步数更少且性能更优。即使在训练不足的情况下,该控制器仍优于独立控制器,突显了其在现实世界中更安全、更高效部署的潜力。