Cost-effective asset management is an area of interest across several industries. Specifically, this paper develops a deep reinforcement learning (DRL) solution to automatically determine an optimal rehabilitation policy for continuously deteriorating water pipes. We approach the problem of rehabilitation planning in an online and offline DRL setting. In online DRL, the agent interacts with a simulated environment of multiple pipes with distinct lengths, materials, and failure rate characteristics. We train the agent using deep Q-learning (DQN) to learn an optimal policy with minimal average costs and reduced failure probability. In offline learning, the agent uses static data, e.g., DQN replay data, to learn an optimal policy via a conservative Q-learning algorithm without further interactions with the environment. We demonstrate that DRL-based policies improve over standard preventive, corrective, and greedy planning alternatives. Additionally, learning from the fixed DQN replay dataset in an offline setting further improves the performance. The results warrant that the existing deterioration profiles of water pipes consisting of large and diverse states and action trajectories provide a valuable avenue to learn rehabilitation policies in the offline setting, which can be further fine-tuned using the simulator.
翻译:成本效益型资产管理是多个行业关注的领域。本文提出了一种基于深度强化学习的解决方案,用于自动确定连续劣化水管的最优修复策略。我们在在线与离线深度强化学习场景中处理修复规划问题。在线深度强化学习中,智能体与包含不同长度、材料及故障率特性的多管道模拟环境进行交互。通过深度Q学习训练智能体,使其能够在最小化平均成本并降低故障概率的前提下学习最优策略。在离线学习中,智能体利用静态数据(如深度Q学习重放数据),通过保守Q学习算法在不与环境进一步交互的情况下学习最优策略。研究表明,基于深度强化学习的策略优于标准的预防性、纠正性和贪婪性规划方案。此外,基于固定深度Q学习重放数据集的离线学习能进一步提升性能。结果表明,现有包含大范围多样化状态与行动轨迹的水管劣化概况,为离线环境下学习修复策略提供了有效途径,且可通过仿真器对其进行进一步微调。