Multi-agent reinforcement learning (MARL) has shown significant potential in traffic signal control (TSC). However, current MARL-based methods often suffer from insufficient generalization due to the fixed traffic patterns and road network conditions used during training. This limitation results in poor adaptability to new traffic scenarios, leading to high retraining costs and complex deployment. To address this challenge, we propose two algorithms: PLight and PRLight. PLight employs a model-based reinforcement learning approach, pretraining control policies and environment models using predefined source-domain traffic scenarios. The environment model predicts the state transitions, which facilitates the comparison of environmental features. PRLight further enhances adaptability by adaptively selecting pre-trained PLight agents based on the similarity between the source and target domains to accelerate the learning process in the target domain. We evaluated the algorithms through two transfer settings: (1) adaptability to different traffic scenarios within the same road network, and (2) generalization across different road networks. The results show that PRLight significantly reduces the adaptation time compared to learning from scratch in new TSC scenarios, achieving optimal performance using similarities between available and target scenarios.
翻译:多智能体强化学习(MARL)在交通信号控制(TSC)领域展现出巨大潜力。然而,现有基于MARL的方法由于训练时采用的交通模式与路网条件固定,往往泛化能力不足。这一局限导致其对新交通场景的适应性较差,造成高昂的重新训练成本与复杂的部署流程。为应对此挑战,我们提出了两种算法:PLight与PRLight。PLight采用基于模型的强化学习方法,利用预定义的源域交通场景对控制策略与环境模型进行预训练。环境模型用于预测状态转移,从而促进环境特征的比对。PRLight则通过基于源域与目标域相似性自适应选择预训练的PLight智能体,以加速目标域的学习过程,从而进一步提升适应性。我们通过两种迁移设置评估算法性能:(1)同一路网内对不同交通场景的适应能力;(2)跨不同路网的泛化能力。实验结果表明,在新TSC场景中,PRLight相比从零开始学习显著缩短了适应时间,并利用可用场景与目标场景间的相似性实现了最优性能。