Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery. In this context, the goal is to reliably deliver packets with strict deadlines imposed by the application while minimizing overall resource allocation cost. A large body of work has leveraged stochastic optimization techniques to design efficient dynamic routing and scheduling solutions under average delay constraints; however, these methods fall short when faced with strict per-packet delay requirements. We formulate the minimum-cost delay-constrained network control problem as a constrained Markov decision process and utilize constrained deep reinforcement learning (CDRL) techniques to effectively minimize total resource allocation cost while maintaining timely throughput above a target reliability level. Results indicate that the proposed CDRL-based solution can ensure timely packet delivery even when existing baselines fall short, and it achieves lower cost compared to other throughput-maximizing methods.
翻译:下一代网络致力于为需要及时且经济高效数据包传输的实时交互服务提供性能保障。在此背景下,目标是在满足应用层严格截止时间要求的前提下可靠地交付数据包,同时最小化总体资源分配成本。已有大量研究利用随机优化技术,在平均延迟约束下设计高效的动态路由与调度方案;然而,当面临严格的数据包级延迟要求时,这些方法存在不足。本文将最小成本延迟约束网络控制问题建模为约束马尔可夫决策过程,并采用约束深度强化学习(CDRL)技术,在将及时吞吐量维持在目标可靠性水平之上的同时,有效最小化总资源分配成本。结果表明,所提出的基于CDRL的解决方案即使在现有基线方法失效时仍能确保数据包的及时交付,且相较于其他吞吐量最大化方法实现了更低的成本。