Reinforcement learning (RL) is an important field of research in machine learning that is increasingly being applied to complex optimization problems in physics. In parallel, concepts from physics have contributed to important advances in RL with developments such as entropy-regularized RL. While these developments have led to advances in both fields, obtaining analytical solutions for optimization in entropy-regularized RL is currently an open problem. In this paper, we establish a mapping between entropy-regularized RL and research in non-equilibrium statistical mechanics focusing on Markovian processes conditioned on rare events. In the long-time limit, we apply approaches from large deviation theory to derive exact analytical results for the optimal policy and optimal dynamics in Markov Decision Process (MDP) models of reinforcement learning. The results obtained lead to a novel analytical and computational framework for entropy-regularized RL which is validated by simulations. The mapping established in this work connects current research in reinforcement learning and non-equilibrium statistical mechanics, thereby opening new avenues for the application of analytical and computational approaches from one field to cutting-edge problems in the other.
翻译:强化学习(RL)是机器学习中一个重要的研究领域,正越来越多地被应用于物理学中的复杂优化问题。与此同时,物理学中的概念也推动了RL的重要进展,例如熵正则化RL的发展。尽管这些发展促进了两个领域的进步,但在熵正则化RL中获得优化的解析解目前仍是一个开放性问题。在本文中,我们建立了熵正则化RL与非平衡统计力学中关注稀有事件条件的马尔可夫过程研究之间的映射。在长时间极限下,我们应用大偏差理论的方法,推导出强化学习马尔可夫决策过程(MDP)模型中最优策略与最优动力学的精确解析结果。所得结果构建了一个新颖的熵正则化RL解析与计算框架,并通过仿真进行了验证。本文建立的映射连接了当前强化学习与非平衡统计力学的研究,从而为将一个领域的解析和计算方法应用于另一个领域的前沿问题开辟了新途径。