Being able to harness the power of large, static datasets for developing autonomous multi-agent systems could unlock enormous value for real-world applications. Many important industrial systems are multi-agent in nature and are difficult to model using bespoke simulators. However, in industry, distributed system processes can often be recorded during operation, and large quantities of demonstrative data can be stored. Offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective online controllers from static datasets. However, offline MARL is still in its infancy, and, therefore, lacks standardised benchmarks, baselines and evaluation protocols typically found in more mature subfields of RL. This deficiency makes it difficult for the community to sensibly measure progress. In this work, we aim to fill this gap by releasing \emph{off-the-grid MARL (OG-MARL)}: a framework for generating offline MARL datasets and algorithms. We release an initial set of datasets and baselines for cooperative offline MARL, created using the framework, along with a standardised evaluation protocol. Our datasets provide settings that are characteristic of real-world systems, including complex dynamics, non-stationarity, partial observability, suboptimality and sparse rewards, and are generated from popular online MARL benchmarks. We hope that OG-MARL will serve the community and help steer progress in offline MARL, while also providing an easy entry point for researchers new to the field.
翻译:能够利用大型静态数据集开发自主多智能体系统,将为现实世界应用释放巨大价值。许多重要的工业系统本质上具有多智能体特性,且难以通过定制模拟器建模。然而在工业场景中,分布式系统进程通常在运行过程中可被记录,并存储大量示范性数据。离线多智能体强化学习(MARL)为从静态数据构建有效的在线控制器提供了有前景的范式。但当前离线MARL仍处于发展初期,缺乏更成熟的RL子领域通常拥有的标准化基准、基线方法和评估协议,这使得学界难以合理衡量研究进展。本研究通过发布“离线网格MARL(OG-MARL)”框架填补这一空白——该框架可生成离线MARL数据集与算法。我们发布了基于该框架创建的首批合作型离线MARL数据集与基线方法,并配套标准化评估协议。这些数据集包含现实系统典型特征:复杂动力学、非平稳性、部分可观测性、次优性及稀疏奖励,均基于主流在线MARL基准生成。我们期待OG-MARL既能服务学术界、推动离线MARL领域进展,也能为新研究者提供便捷的入门路径。