The assumption that data are independent and identically distributed underpins all machine learning. When data are collected sequentially from agent experiences this assumption does not generally hold, as in reinforcement learning. Here, we derive a method that overcomes these limitations by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables agents to learn continually in single-shot deployments regardless of how they are initialized. Moreover, we prove our approach generalizes well-known maximum entropy techniques, and show that it robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning, and control pave the way towards more transparent and reliable decision-making in reinforcement learning agents, such as locomoting robots and self-driving cars.
翻译:数据独立同分布的假设支撑着所有机器学习。然而,当数据从智能体经验中顺序收集时(如强化学习场景),该假设通常不成立。本文通过利用遍历过程的统计力学推导出一种方法以克服这些局限,并将其命名为“最大扩散强化学习”。通过对智能体经验进行去相关处理,我们的方法可证明地使智能体能够在单次部署中持续学习,且不受初始化方式的影响。此外,我们证明该方法可泛化著名的最大熵技术,并展示其在主流基准测试中稳健地超越当前最优性能。本工作融合物理、学习与控制领域的前沿成果,为强化学习智能体(如运动机器人和自动驾驶汽车)实现更透明、更可靠的决策开辟了道路。