Reinforcement learning (RL) has shown promise in solving various combinatorial optimization problems. However, conventional RL faces challenges when dealing with real-world constraints, especially when action space feasibility is explicit and dependent on the corresponding state or trajectory. In this work, we focus on using RL in container shipping, often considered the cornerstone of global trade, by dealing with the critical challenge of master stowage planning. The main objective is to maximize cargo revenue and minimize operational costs while navigating demand uncertainty and various complex operational constraints, namely vessel capacity and stability, which must be dynamically updated along the vessel's voyage. To address this problem, we implement a deep reinforcement learning framework with feasibility projection to solve the master stowage planning problem (MPP) under demand uncertainty. The experimental results show that our architecture efficiently finds adaptive, feasible solutions for this multi-stage stochastic optimization problem, outperforming traditional mixed-integer programming and RL with feasibility regularization. Our AI-driven decision-support policy enables adaptive and feasible planning under uncertainty, optimizing operational efficiency and capacity utilization while contributing to sustainable and resilient global supply chains.
翻译:强化学习(RL)在解决各类组合优化问题中展现出潜力。然而,传统强化学习方法在处理现实约束时面临挑战,尤其当动作空间可行性需显式考虑且依赖于对应状态或轨迹时。本研究聚焦于将强化学习应用于常被视为全球贸易基石的集装箱航运领域,通过应对主配载规划这一关键挑战展开研究。其主要目标是在应对需求不确定性的同时,最大化货物收益并最小化运营成本,同时需满足各类复杂运营约束(即船舶载运能力与稳定性要求),这些约束需随船舶航程动态更新。为解决该问题,我们构建了融合可行性投影的深度强化学习框架,以求解需求不确定条件下的主配载规划问题(MPP)。实验结果表明,我们的架构能为此多阶段随机优化问题高效找到自适应可行解,其性能优于传统混合整数规划及采用可行性正则化的强化学习方法。我们提出的AI驱动决策支持策略能够在不确定性条件下实现自适应可行规划,在优化运营效率与运力利用率的同时,为构建可持续、高韧性的全球供应链体系作出贡献。