Multicast communication technology is widely applied in wireless environments with a high device density. Traditional wireless network architectures have difficulty flexibly obtaining and maintaining global network state information and cannot quickly respond to network state changes, thus affecting the throughput, delay, and other QoS requirements of existing multicasting solutions. Therefore, this paper proposes a new multicast routing method based on multiagent deep reinforcement learning (MADRL-MR) in a software-defined wireless networking (SDWN) environment. First, SDWN technology is adopted to flexibly configure the network and obtain network state information in the form of traffic matrices representing global network links information, such as link bandwidth, delay, and packet loss rate. Second, the multicast routing problem is divided into multiple subproblems, which are solved through multiagent cooperation. To enable each agent to accurately understand the current network state and the status of multicast tree construction, the state space of each agent is designed based on the traffic and multicast tree status matrices, and the set of AP nodes in the network is used as the action space. A novel single-hop action strategy is designed, along with a reward function based on the four states that may occur during tree construction: progress, invalid, loop, and termination. Finally, a decentralized training approach is combined with transfer learning to enable each agent to quickly adapt to dynamic network changes and accelerate convergence. Simulation experiments show that MADRL-MR outperforms existing algorithms in terms of throughput, delay, packet loss rate, etc., and can establish more intelligent multicast routes.
翻译:组播通信技术在设备密集的无线环境中得到广泛应用。传统无线网络架构难以灵活获取和维护全局网络状态信息,且无法快速响应网络状态变化,从而影响现有组播方案的吞吐量、时延等QoS需求。为此,本文提出一种在软件定义无线网络(SDWN)环境下基于多智能体深度强化学习的组播路由方法(MADRL-MR)。首先,采用SDWN技术灵活配置网络,并以流量矩阵的形式获取表示全局网络链路信息的网络状态参数,如链路带宽、时延和丢包率。其次,将组播路由问题分解为多个子问题,通过多智能体协作进行求解。为使每个智能体准确理解当前网络状态及组播树构建情况,基于流量和组播树状态矩阵设计各智能体的状态空间,并以网络中的AP节点集合作为动作空间。提出一种新颖的单跳动作策略,同时基于树构建过程中可能出现的四种状态(进展、无效、循环、终止)设计奖励函数。最后,结合分布式训练方法与迁移学习,使各智能体能够快速适应动态网络变化并加速收敛。仿真实验表明,MADRL-MR在吞吐量、时延、丢包率等指标上均优于现有算法,能够建立更智能的组播路由。