The assumption that data are independent and identically distributed underpins all machine learning. When data are collected sequentially from agent experiences this assumption does not generally hold, as in reinforcement learning. Here, we derive a method that overcomes these limitations by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables single-shot learning in continuous deployments over the course of individual task attempts. Moreover, we prove our approach generalizes well-known maximum entropy techniques, and robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning, and control pave the way towards more transparent and reliable decision-making in reinforcement learning agents, such as locomoting robots and self-driving cars.
翻译:所有机器学习的基础假设是数据独立同分布。当数据通过智能体经验顺序收集时,这一假设在强化学习中通常不成立。本文通过利用各态历经过程的统计力学,推导出一种克服这些局限性的方法,并将其命名为最大扩散强化学习。通过解耦智能体经验,我们的方法可在单次任务尝试的连续部署中实现可证明的单样本学习。此外,我们还证明了该方法可泛化著名的最大熵技术,并在主流基准测试中稳健地超越现有最优性能。我们在物理、学习与控制交叉领域的研究成果,为强化学习智能体(如移动机器人和自动驾驶汽车)实现更透明、更可靠的决策制定铺平了道路。