Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work, we introduce "Box o Flows", a novel benchtop experimental control system for systematically evaluating RL algorithms in dynamic real-world scenarios. We describe the key components of the Box o Flows, and through a series of experiments demonstrate how state-of-the-art model-free RL algorithms can synthesize a variety of complex behaviors via simple reward specifications. Furthermore, we explore the role of offline RL in data-efficient hypothesis testing by reusing past experiences. We believe that the insights gained from this preliminary study and the availability of systems like the Box o Flows support the way forward for developing systematic RL algorithms that can be generally applied to complex, dynamical systems. Supplementary material and videos of experiments are available at https://sites.google.com/view/box-o-flows/home.
翻译:强化学习在真实世界应用中的最新进展依赖于大规模精确模拟系统的能力。然而,流体动力学系统等领域呈现出复杂的动态现象,难以在高积分速率下进行模拟,这限制了现代深度强化学习算法对通常昂贵或安全关键硬件的直接应用。在本研究中,我们引入了"Box o Flows"——一种用于在动态真实世界场景中系统评估强化学习算法的新型实验控制平台。我们描述了Box o Flows的关键组件,并通过一系列实验展示了最先进的无模型强化学习算法如何通过简单的奖励规约合成多种复杂行为。此外,我们探讨了离线强化学习在通过重用过往经验实现数据高效假设检验中的作用。我们相信,从这项初步研究中获得的见解以及Box o Flows等系统的可用性,为开发可普遍应用于复杂动态系统的系统化强化学习算法铺平了道路。补充材料及实验视频可从 https://sites.google.com/view/box-o-flows/home 获取。