Model-free reinforcement learning (RL) is inherently a reactive method, operating under the assumption that it starts with no prior knowledge of the system and entirely depends on trial-and-error for learning. This approach faces several challenges, such as poor sample efficiency, generalization, and the need for well-designed reward functions to guide learning effectively. On the other hand, controllers based on complete system dynamics do not require data. This paper addresses the intermediate situation where there is not enough model information for complete controller design, but there is enough to suggest that a model-free approach is not the best approach either. By carefully decoupling known and unknown information about the system dynamics, we obtain an embedded controller guided by our partial model and thus improve the learning efficiency of an RL-enhanced approach. A modular design allows us to deploy mainstream RL algorithms to refine the policy. Simulation results show that our method significantly improves sample efficiency compared with standard RL methods on continuous control tasks, and also offers enhanced performance over traditional control approaches. Experiments on a real ground vehicle also validate the performance of our method, including generalization and robustness.
翻译:无模型强化学习本质上是一种反应式方法,其基本假设是算法在没有任何系统先验知识的情况下启动,完全依赖试错进行学习。这种方法面临若干挑战,例如样本效率低下、泛化能力不足,以及需要精心设计奖励函数来有效引导学习。另一方面,基于完备系统动力学的控制器则无需依赖数据。本文探讨了一种中间情形:系统模型信息虽不足以支撑完备控制器的设计,但足以表明纯无模型方法亦非最优选择。通过仔细解耦系统动力学中的已知与未知信息,我们获得了一个由部分模型引导的嵌入式控制器,从而提升了强化学习增强方法的学习效率。模块化设计使我们能够部署主流强化学习算法来优化策略。仿真结果表明,在连续控制任务中,相较于标准强化学习方法,我们的方法显著提升了样本效率,同时其性能也优于传统控制方法。在真实地面车辆上进行的实验进一步验证了本方法的性能,包括泛化能力和鲁棒性。