Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

We present a multi-agent Deep Reinforcement Learning (DRL) framework for managing large transportation infrastructure systems over their life-cycle. Life-cycle management of such engineering systems is a computationally intensive task, requiring appropriate sequential inspection and maintenance decisions able to reduce long-term risks and costs, while dealing with different uncertainties and constraints that lie in high-dimensional spaces. To date, static age- or condition-based maintenance methods and risk-based or periodic inspection plans have mostly addressed this class of optimization problems. However, optimality, scalability, and uncertainty limitations are often manifested under such approaches. The optimization problem in this work is cast in the framework of constrained Partially Observable Markov Decision Processes (POMDPs), which provides a comprehensive mathematical basis for stochastic sequential decision settings with observation uncertainties, risk considerations, and limited resources. To address significantly large state and action spaces, a Deep Decentralized Multi-agent Actor-Critic (DDMAC) DRL method with Centralized Training and Decentralized Execution (CTDE), termed as DDMAC-CTDE is developed. The performance strengths of the DDMAC-CTDE method are demonstrated in a generally representative and realistic example application of an existing transportation network in Virginia, USA. The network includes several bridge and pavement components with nonstationary degradation, agency-imposed constraints, and traffic delay and risk considerations. Compared to traditional management policies for transportation networks, the proposed DDMAC-CTDE method vastly outperforms its counterparts. Overall, the proposed algorithmic framework provides near optimal solutions for transportation infrastructure management under real-world constraints and complexities.

翻译：我们提出了一种多智能体深度强化学习框架，用于管理大型交通基础设施系统在其全生命周期内的运营。此类工程系统的生命周期管理是一项计算密集型任务，需要制定合适的序贯检测与维护决策，以降低长期风险与成本，同时应对高维空间中的多种不确定性与约束条件。迄今为止，静态的基于使用年限或状态的维护方法、基于风险的检测计划或定期检测方案主要被用于解决这类优化问题。然而，这些方法往往存在最优性、可扩展性和不确定性处理能力的局限性。本研究将优化问题置于约束部分可观测马尔可夫决策过程的框架下，该框架为具有观测不确定性、风险考量和有限资源的随机序贯决策场景提供了全面的数学基础。为应对显著庞大的状态与动作空间，我们提出了一种采用集中训练与分散执行的多智能体深度参与者-评论家深度强化学习方法，命名为DDMAC-CTDE。通过在弗吉尼亚州（美国）现有交通网络的一个具有普遍代表性和现实性的示例应用，展示了DDMAC-CTDE方法的性能优势。该网络包含多个具有非平稳退化特性的桥梁与路面构件、机构施加的约束条件，以及交通延误与风险考量。与传统的交通网络管理策略相比，所提出的DDMAC-CTDE方法显著优于对照方法。总体而言，本算法框架在现实约束与复杂性条件下，为交通基础设施管理提供了接近最优的解决方案。