The assumption that data are independent and identically distributed underpins all machine learning. When data are collected sequentially from agent experiences this assumption does not generally hold, as in reinforcement learning. Here, we derive a method that overcomes these limitations by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables single-shot learning in continuous deployments over the course of individual task attempts. Moreover, we prove our approach generalizes well-known maximum entropy techniques, and robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning, and control pave the way towards more transparent and reliable decision-making in reinforcement learning agents, such as locomoting robots and self-driving cars.
翻译:数据独立同分布的假设支撑着所有机器学习。当数据通过智能体经验顺序收集时,如强化学习场景中该假设通常不成立。本文通过利用遍历过程的统计力学原理,推导出一种突破此限制的方法,我们称之为最大扩散强化学习。通过解耦智能体经验,我们的方法能够在连续部署中针对单次任务尝试实现单样本学习。此外,我们证明该方法可泛化著名的最大熵技术,并在主流基准测试中稳健地超越当前最优性能。我们的研究融合物理、学习与控制三大领域,为强化学习智能体(如移动机器人和自动驾驶汽车)构建更透明、更可靠的决策机制铺平了道路。