The assumption that data are independent and identically distributed underpins all machine learning. When data are collected sequentially from agent experiences this assumption does not generally hold, as in reinforcement learning. Here, we derive a method that overcomes these limitations by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables agents to learn continually in single-shot deployments regardless of how they are initialized. Moreover, we prove our approach generalizes well-known maximum entropy techniques, and show that it robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning, and control pave the way towards more transparent and reliable decision-making in reinforcement learning agents, such as locomoting robots and self-driving cars.
翻译:数据独立同分布的假设是机器学习的基石。当数据通过智能体经验顺序采集时(如强化学习场景),该假设通常不成立。本文通过利用遍历过程的统计力学原理推导出一种突破上述限制的方法,我们称之为最大扩散强化学习。通过解耦智能体经验之间的关联,该方法可证明地使智能体能够在单次部署中持续学习,且不受初始化方式影响。进一步地,我们证明该方法可泛化著名的最大熵技术,并在多个主流基准测试中稳健地超越当前最优性能。本文的研究成果融合了物理学、学习理论与控制理论,为提升强化学习智能体(如运动机器人和自动驾驶汽车)决策过程的透明性与可靠性开辟了新路径。