In this work we analyze Multi-Agent Advantage Actor-Critic (MA2C) a recently proposed multi-agent reinforcement learning algorithm that can be applied to adaptive traffic signal control (ATSC) problems. To evaluate its potential we compare MA2C with Independent Advantage Actor-Critic (IA2C) and other Reinforcement Learning or heuristic based algorithms. Specifically, we analyze MA2C theoretically with the framework provided by non-Markov decision processes, which allows a deeper insight of the algorithm, and we critically examine the effectiveness and the robustness of the method by testing it in two traffic areas located in Bologna (Italy) simulated in SUMO, a software modeling tool for ATSC problems. Our results indicate that MA2C, trained with pseudo-random vehicle flows, is a promising technique able to outperform the alternative methods.
翻译:本研究分析了近期提出的多智能体优势行动者-评论家(MA2C)算法,该算法可应用于自适应交通信号控制问题。为评估其潜力,我们将MA2C与独立优势行动者-评论家(IA2C)及其他基于强化学习或启发式的算法进行了比较。具体而言,我们借助非马尔可夫决策过程的理论框架对MA2C进行理论分析,从而更深入地理解该算法,并在基于SUMO(适用于ATSC问题的软件建模工具)模拟的意大利博洛尼亚两处交通区域中,对其有效性和鲁棒性进行了批判性检验。结果表明,采用伪随机车流训练的MA2C是一种富有前景的技术,能够超越其他替代方法。