This study presents an Actor-Critic Cooperative Compensated Model Predictive Controller (AC3MPC) designed to address unknown system dynamics. To avoid the difficulty of modeling highly complex dynamics and ensuring realtime control feasibility and performance, this work uses deep reinforcement learning with a model predictive controller in a cooperative framework to handle unknown dynamics. The model-based controller takes on the primary role as both controllers are provided with predictive information about the other. This improves tracking performance and retention of inherent robustness of the model predictive controller. We evaluate this framework for off-road autonomous driving on unknown deformable terrains that represent sandy deformable soil, sandy and rocky soil, and cohesive clay-like deformable soil. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers by upto 29.2% and 10.2%. This framework generalized well over varied and previously unseen terrain characteristics to track longitudinal reference speeds with lower errors. Furthermore, this required significantly less training data compared to purely learning-based controller, while delivering better performance even when under-trained.
翻译:本研究提出了一种Actor-Critic协同补偿模型预测控制器(AC3MPC),旨在解决未知系统动力学问题。为避免对高度复杂动力学建模的困难,并确保实时控制可行性及性能,本工作将深度强化学习与模型预测控制器置于协同框架中,以处理未知动力学。基于模型的控制器承担主要角色,因为两个控制器均能获得关于对方的预测信息。这提升了跟踪性能,并保留了模型预测控制器固有的鲁棒性。我们在代表沙质可变形土壤、沙石混合土壤以及黏性类黏土可变形土壤的未知可变形地形上,对该框架进行了越野自动驾驶评估。研究结果表明,我们的控制器在统计意义上分别优于独立的基于模型控制器与基于学习控制器,性能提升最高达29.2%和10.2%。该框架能良好泛化至多样且先前未见的地形特征,以更低的误差跟踪纵向参考速度。此外,与纯基于学习的控制器相比,其所需训练数据显著减少,即使在训练不足的情况下仍能提供更优性能。