Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by stochasticity in training and sensitivity of agent performance to the behavior of other agents. We propose a meta-game evaluation framework for deep MARL, by framing each MARL algorithm as a meta-strategy, and repeatedly sampling normal-form empirical games over combinations of meta-strategies resulting from different random seeds. Each empirical game captures both self-play and cross-play factors across seeds. These empirical games provide the basis for constructing a sampling distribution, using bootstrapping, over a variety of game analysis statistics. We use this approach to evaluate state-of-the-art deep MARL algorithms on a class of negotiation games. From statistics on individual payoffs, social welfare, and empirical best-response graphs, we uncover strategic relationships among self-play, population-based, model-free, and model-based MARL methods.We also investigate the effect of run-time search as a meta-strategy operator, and find via meta-game analysis that the search version of a meta-strategy generally leads to improved performance.
翻译:深层多智能体强化学习算法的评估因训练过程中的随机性以及智能体性能对其他智能体行为的敏感性而复杂化。本文提出了一种面向深层多智能体强化学习的元博弈评估框架,将每个多智能体强化学习算法视为元策略,并通过重复采样不同随机种子产生的元策略组合对应的正规型经验博弈。每个经验博弈同时捕获了跨种子因素的自博弈与互博弈特征。基于这些经验博弈,利用自助法构建了多种博弈分析统计量的采样分布。我们采用该方法对一类谈判博弈中的前沿深层多智能体强化学习算法进行评估。通过个体收益、社会福利及经验最优响应图等统计指标,揭示了自博弈、种群方法、无模型方法与基于模型方法之间的策略关系。此外,我们研究了运行时搜索作为元策略算子的效果,并通过元博弈分析发现:元策略的搜索版本通常能提升性能。