Establishing sound experimental standards and rigour is important in any growing field of research. Deep Multi-Agent Reinforcement Learning (MARL) is one such nascent field. Although exciting progress has been made, MARL has recently come under scrutiny for replicability issues and a lack of standardised evaluation methodology, specifically in the cooperative setting. Although protocols have been proposed to help alleviate the issue, it remains important to actively monitor the health of the field. In this work, we extend the database of evaluation methodology previously published by containing meta-data on MARL publications from top-rated conferences and compare the findings extracted from this updated database to the trends identified in their work. Our analysis shows that many of the worrying trends in performance reporting remain. This includes the omission of uncertainty quantification, not reporting all relevant evaluation details and a narrowing of algorithmic development classes. Promisingly, we do observe a trend towards more difficult scenarios in SMAC-v1, which if continued into SMAC-v2 will encourage novel algorithmic development. Our data indicate that replicability needs to be approached more proactively by the MARL community to ensure trust in the field as we move towards exciting new frontiers.
翻译:在任何蓬勃发展的研究领域中,建立可靠的实验标准与严谨性都至关重要。深度多智能体强化学习(MARL)正是这样一个新兴领域。尽管已取得令人振奋的进展,但MARL近期因可复现性问题及缺乏标准化评估方法(尤其在协作场景中)而备受关注。尽管已有相关协议被提出以缓解此问题,但持续监测该领域的发展态势仍至关重要。本研究通过扩展先前发表的评估方法论数据库(该数据库收录了顶级会议中MARL论文的元数据),将更新后的数据库分析结果与先前研究中识别的趋势进行对比。分析表明,许多令人担忧的性能报告趋势依然存在,包括不确定性量化的缺失、未报告所有相关评估细节,以及算法开发类别的收窄。值得庆幸的是,我们观察到SMAC-v1中出现了向更复杂场景演进的趋势,若该趋势延续至SMAC-v2,将促进新型算法的开发。数据表明,MARL社区需要更主动地应对可复现性问题,以确保我们迈向令人振奋的新前沿时,该领域仍值得信赖。