We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish theoretical properties of our approach and derive an algorithm based on a specific instance of this approach. Our empirical results demonstrate the significant benefits of our approach.
翻译:我们设计了一种基于控制理论的强化学习方法,以支持直接学习最优策略。我们建立了该方法的理论性质,并基于该方法的一个具体实例推导出相应算法。实验结果表明,该方法具有显著优势。