Multi-hop uncrewed aerial vehicle (UAV) networks are promising to extend the terrestrial network coverage. Existing multi-hop UAV networks employ a single routing path by selecting the next-hop forwarding node in a hop-by-hop manner, which leads to local congestion and increases traffic delays. In this paper, a novel traffic-adaptive multipath routing method is proposed for multi-hop UAV networks, which enables each UAV to dynamically split and forward traffic flows across multiple next-hop neighbors, thus meeting latency requirements of diverse traffic flows in dynamic mobile environments. An on-time packet delivery ratio maximization problem is formulated to determine the traffic splitting ratios at each hop. This sequential decision-making problem is modeled as a decentralized partially observable Markov decision process (Dec-POMDP). To solve this Dec-POMDP, a novel multi-agent deep reinforcement leaning (MADRL) algorithm, termed Independent Proximal Policy Optimization with Dirichlet Modeling (IPPO-DM), is developed. Specifically, the IPPO serves as the core optimization framework, where the Dirichlet distribution is leveraged to parameterize a continuous stochastic policy network on the probability simplex, inherently ensuring feasible traffic splitting ratios. Simulation results demonstrate that IPPO-DM outperforms benchmark schemes in terms of both delivery latency guarantee and packet loss performance.
翻译:多跳无人机网络有望扩展地面网络覆盖范围。现有的多跳无人机网络采用单一路由路径,通过逐跳选择下一跳转发节点,这会导致局部拥塞并增加流量延迟。本文针对多跳无人机网络提出一种新型流量自适应多路径路由方法,使每架无人机能够动态地将流量流拆分并转发至多个下一跳邻居节点,从而满足动态移动环境中多样化流量流的延迟要求。本文构建了一个准时数据包投递率最大化问题,以确定每一跳的流量分配比例。该序贯决策问题被建模为去中心化部分可观测马尔可夫决策过程。为求解此Dec-POMDP问题,我们提出了一种新型多智能体深度强化学习算法,称为基于狄利克雷建模的独立近端策略优化算法。具体而言,IPPO作为核心优化框架,其中利用狄利克雷分布对概率单纯形上的连续随机策略网络进行参数化,从而在本质上确保可行的流量分配比例。仿真结果表明,在传输延迟保障与丢包性能方面,IPPO-DM均优于基准方案。