We adapt reinforcement learning (RL) methods for continuous control to bridge the gap between complete ignorance and perfect knowledge of the environment. Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes inspiration from both model-free RL and model-based control. It uses incomplete information from a partial model and retains RL's data-driven adaption towards optimal performance. The linear quadratic regulator provides a case study; numerical experiments demonstrate the effectiveness and resulting benefits of the proposed method.
翻译:我们改进了面向连续控制的强化学习方法,以弥合对环境的完全未知与完美知识之间的差距。所提出的方法——部分知识最小二乘策略迭代(PLSPI)——同时借鉴了无模型强化学习与基于模型的控制思想。该方法利用局部模型提供的不完整信息,并保留强化学习通过数据驱动方式自适应优化性能的特点。线性二次型调节器作为案例研究;数值实验验证了所提方法的有效性及其带来的优势。