Service region design determines the geographic coverage of service networks, shaping long-term operational performance. Capital and operational constraints preclude simultaneous large-scale deployment, requiring expansion to proceed sequentially. The resulting challenge is to determine when and where to invest under demand uncertainty, balancing intertemporal trade-offs between early and delayed investment and accounting for network effects whereby each deployment reshapes future demand through inter-regional connectivity. This study addresses a sequential service region design (SSRD) problem incorporating two practical yet underexplored factors: a $k$-region constraint that limits the number of regions investable per period and a stochastic spillover effect linking investment decisions to demand evolution. The resulting problem requires sequencing regional portfolios under uncertainty, leading to a combinatorial explosion in feasible investment sequences. To address this challenge, we propose a solution framework that integrates real options analysis (ROA) with a Transformer-based Proximal Policy Optimization (TPPO) algorithm. ROA evaluates the intertemporal option value of investment sequences, while TPPO learns sequential policies that directly generate high option-value sequences without exhaustive enumeration. Numerical experiments on realistic multi-region settings demonstrate that TPPO converges faster than benchmark DRL methods and consistently identifies sequences with superior option value. Case studies and sensitivity analyses further confirm robustness and provide insights on investment concurrency, regional prioritization, and the increasing benefits of adaptive expansion via our approach under stronger spillovers and dynamic market conditions.
翻译:服务区域设计决定了服务网络的地理覆盖范围,并塑造长期运营绩效。资本与运营约束使得大规模同步部署难以实现,因此需要采用序列化扩展方式。由此产生的核心挑战在于:在需求不确定条件下,决定何时何地进行投资,以平衡早期投资与延迟投资之间的跨期权衡,并考虑网络效应——即每次部署会通过区域间连通性重塑未来需求。本研究针对一个包含两个实际存在但尚未充分探讨因素的序列化服务区域设计问题:其一是$k$区域约束,即限制每期可投资区域数量;其二是将投资决策与需求演化相关联的随机溢出效应。该问题要求在不确定性下对区域投资组合进行排序,导致可行投资序列的组合爆炸。为应对这一挑战,我们提出了一个整合实物期权分析与基于Transformer的近端策略优化算法的求解框架。实物期权分析用于评估投资序列的跨期期权价值,而TPPO算法则通过学习序列化策略直接生成高期权价值序列,无需穷举枚举。在多区域实际场景的数值实验中,TPPO相比基准深度强化学习方法收敛更快,且能持续识别出具有更优期权价值的序列。案例研究与敏感性分析进一步验证了方法的鲁棒性,并就投资并发性、区域优先级排序以及溢出效应增强、市场动态变化条件下通过自适应扩展获得的递增效益提供了管理启示。