Being able to harness the power of large datasets for developing cooperative multi-agent controllers promises to unlock enormous value for real-world applications. Many important industrial systems are multi-agent in nature and are difficult to model using bespoke simulators. However, in industry, distributed processes can often be recorded during operation, and large quantities of demonstrative data stored. Offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective decentralised controllers from such datasets. However, offline MARL is still in its infancy and therefore lacks standardised benchmark datasets and baselines typically found in more mature subfields of reinforcement learning (RL). These deficiencies make it difficult for the community to sensibly measure progress. In this work, we aim to fill this gap by releasing off-the-grid MARL (OG-MARL): a growing repository of high-quality datasets with baselines for cooperative offline MARL research. Our datasets provide settings that are characteristic of real-world systems, including complex environment dynamics, heterogeneous agents, non-stationarity, many agents, partial observability, suboptimality, sparse rewards and demonstrated coordination. For each setting, we provide a range of different dataset types (e.g. Good, Medium, Poor, and Replay) and profile the composition of experiences for each dataset. We hope that OG-MARL will serve the community as a reliable source of datasets and help drive progress, while also providing an accessible entry point for researchers new to the field.
翻译:利用大型数据集开发协作多智能体控制器,有望为实际应用释放巨大价值。许多重要工业系统具有多智能体特性,且难以使用定制模拟器建模。然而,在工业场景中,分布式过程运行期间通常可记录数据,并存储大量示范性数据。离线多智能体强化学习(MARL)提供了一种有前景的范式,可从此类数据集构建有效的分布式控制器。然而,离线MARL仍处于发展初期,因此缺乏强化学习(RL)更成熟子领域常见的标准化基准数据集与基线。这些不足使得社区难以合理衡量研究进展。本文旨在通过发布离网MARL(OG-MARL)填补这一空白:这是一个持续增长的高质量数据集仓库,为合作离线MARL研究提供基线。我们的数据集涵盖了实际系统的典型特征,包括复杂环境动态、异质智能体、非平稳性、大规模智能体、部分可观测性、次优性、稀疏奖励及示范性协调。针对每个场景,我们提供不同类型的数据集(如优秀、中等、较差及回放),并分析每个数据集的经验构成。我们希望OG-MARL能作为可靠的数据集来源服务社区,推动研究进展,同时为新入行的研究人员提供易于入门的切入点。