Offline reinforcement learning (offline RL) is an emerging field that has recently begun gaining attention across various application domains due to its ability to learn strategies from earlier collected datasets. Offline RL proved very successful, paving a path to solving previously intractable real-world problems, and we aim to generalize this paradigm to a multiplayer-game setting. To this end, we introduce a problem of offline equilibrium finding (OEF) and construct multiple types of datasets across a wide range of games using several established methods. To solve the OEF problem, we design a model-based framework that can directly apply any online equilibrium finding algorithm to the OEF setting while making minimal changes. The three most prominent contemporary online equilibrium finding algorithms are adapted to the context of OEF, creating three model-based variants: OEF-PSRO and OEF-CFR, which generalize the widely-used algorithms PSRO and Deep CFR to compute Nash equilibria (NEs), and OEF-JPSRO, which generalizes the JPSRO to calculate (Coarse) Correlated equilibria ((C)CEs). We also combine the behavior cloning policy with the model-based policy to further improve the performance and provide a theoretical guarantee of the solution quality. Extensive experimental results demonstrate the superiority of our approach over offline RL algorithms and the importance of using model-based methods for OEF problems. We hope our work will contribute to advancing research in large-scale equilibrium finding.
翻译:离线强化学习(offline RL)是一个新兴领域,近年来因其能够从先前收集的数据集中学习策略而受到各应用领域的广泛关注。离线RL已被证明非常成功,为解决此前难以处理的现实世界问题开辟了道路,我们旨在将此范式推广到多人博弈场景。为此,我们引入离线均衡求解(OEF)问题,并利用多种成熟方法在广泛博弈中构建多种类型的数据集。为解决OEF问题,我们设计了一个基于模型的框架,可在最小化改动的前提下将任意在线均衡求解算法直接应用于OEF场景。我们将三种最著名的当代在线均衡求解算法适配到OEF环境中,创建了三种基于模型的变体:OEF-PSRO与OEF-CFR(分别泛化广泛使用的PSRO与Deep CFR算法以计算纳什均衡(NE)),以及OEF-JPSRO(泛化JPSRO以计算(粗)相关均衡((C)CE))。我们还结合行为克隆策略与基于模型的策略以进一步提升性能,并给出解质量的理论保证。大量实验结果证明,我们的方法优于离线RL算法,且基于模型的方法对OEF问题至关重要。希望本工作能为大规模均衡求解研究的发展做出贡献。