基于集中训练与分散执行的多智能体深度强化学习在交通基础设施管理中的应用 (Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management)

Life-cycle management of large-scale transportation systems requires determining a sequence of inspection and maintenance decisions to minimize long-term risks and costs while dealing with multiple uncertainties and constraints that lie in high-dimensional spaces. Traditional approaches have been widely applied but often suffer from limitations related to optimality, scalability, and the ability to properly handle uncertainty. Moreover, many existing methods rely on unconstrained formulations that overlook critical operational constraints. We address these issues in this work by casting the optimization problem within the framework of constrained Partially Observable Markov Decision Processes (POMDPs), which provide a robust mathematical foundation for stochastic sequential decision-making under observation uncertainties, in the presence of risk and resource limitations. To tackle the high dimensionality of state and action spaces, we propose DDMAC-CTDE, a Deep Decentralized Multi-Agent Actor-Critic (DDMAC) reinforcement learning architecture with Centralized Training and Decentralized Execution (CTDE). To demonstrate the utility of the proposed framework, we also develop a new comprehensive benchmark environment representing an existing transportation network in Virginia, U.S., with heterogeneous pavement and bridge assets undergoing nonstationary degradation. This environment incorporates multiple practical constraints related to budget limits, performance guidelines, traffic delays, and risk considerations. On this benchmark, DDMAC-CTDE consistently outperforms standard transportation management baselines, producing better policies. Together, the proposed framework and benchmark provide (i) a scalable, constraint-aware methodology, and (ii) a realistic, rigorous testbed for comprehensive evaluation of Deep Reinforcement Learning (DRL) for transportation infrastructure management.

翻译：大规模交通系统的全生命周期管理需要在处理高维空间中的多重不确定性与约束的同时，确定一系列检测与维护决策序列，以最小化长期风险与成本。传统方法已得到广泛应用，但通常在最优性、可扩展性以及恰当处理不确定性的能力方面存在局限。此外，许多现有方法依赖于忽略关键运行约束的无约束问题表述。在本工作中，我们将该优化问题置于约束部分可观测马尔可夫决策过程（POMDPs）的框架内，该框架为存在风险和资源限制情况下、观测不确定性下的随机序贯决策提供了坚实的数学基础。为应对状态与动作空间的高维性，我们提出了DDMAC-CTDE，一种采用集中训练与分散执行（CTDE）机制的深度分散式多智能体行动者-评论家（DDMAC）强化学习架构。为证明所提框架的实用性，我们还开发了一个新的综合性基准环境，模拟美国弗吉尼亚州一个现有的交通网络，其中包含经历非平稳退化的异质路面与桥梁资产。该环境融合了与预算限制、性能准则、交通延误及风险考量相关的多重实际约束。在此基准测试中，DDMAC-CTDE始终优于标准交通管理基线方法，生成更优的策略。所提出的框架与基准共同提供了：（i）一种可扩展的、具备约束感知能力的方法论，以及（ii）一个用于全面评估深度强化学习（DRL）在交通基础设施管理中应用的现实且严谨的测试平台。