Rapid urbanization in cities like Bangalore has led to severe traffic congestion, making efficient Traffic Signal Control (TSC) essential. Multi-Agent Reinforcement Learning (MARL), often modeling each traffic signal as an independent agent using Q-learning, has emerged as a promising strategy to reduce average commuter delays. While prior work Prashant L A et. al has empirically demonstrated the effectiveness of this approach, a rigorous theoretical analysis of its stability and convergence properties in the context of traffic control has not been explored. This paper bridges that gap by focusing squarely on the theoretical basis of this multi-agent algorithm. We investigate the convergence problem inherent in using independent learners for the cooperative TSC task. Utilizing stochastic approximation methods, we formally analyze the learning dynamics. The primary contribution of this work is the proof that the specific multi-agent reinforcement learning algorithm for traffic control is proven to converge under the given conditions extending it from single agent convergence proofs for asynchronous value iteration.
翻译:摘要:班加罗尔等城市的快速城市化导致严重交通拥堵,使得高效交通信号控制(TSC)至关重要。多智能体强化学习(MARL)通常将每个交通信号建模为使用Q学习的独立智能体,已成为减少平均通勤延迟的有前景策略。尽管先前Prashant L A等人的研究已通过实验证明了该方法的有效性,但在交通控制背景下对其稳定性与收敛性的严格理论分析尚待探索。本文通过聚焦该多智能体算法的理论基础来弥合这一空白。我们研究了在协作式TSC任务中使用独立学习器所固有的收敛问题。利用随机近似方法,我们正式分析了学习动态。本工作的主要贡献在于证明了针对交通控制的特定多智能体强化学习算法在给定条件下收敛,这是从异步值迭代的单智能体收敛证明扩展而来。