Due to the highly dynamic changes in wireless network topologies, efficiently obtaining network status information and flexibly forwarding data to improve communication quality of service are important challenges. This article introduces an intelligent routing algorithm (DRL-PPONSA) based on proximal policy optimization deep reinforcement learning with network situational awareness under a software-defined wireless networking architecture. First, a specific data plane is designed for network topology construction and data forwarding. The control plane collects network traffic information, sends flow tables, and uses a GCN-GRU prediction mechanism to perceive future traffic change trends to achieve network situational awareness. Second, a DRL-based data forwarding mechanism is designed in the knowledge plane. The predicted network traffic matrix and topology information matrix are treated as the environment for DRL agents, while next-hop adjacent nodes are treated as executable actions. Accordingly, action selection strategies are designed for different network conditions to achieve more intelligent, flexible, and efficient routing control. The reward function is designed using network link information and various reward and penalty mechanisms. Additionally, importance sampling and gradient clipping techniques are employed during gradient updating to enhance convergence speed and stability. Experimental results show that DRL-PPONSA outperforms traditional routing methods in network throughput, delay, packet loss rate, and wireless node distance. Compared to value-function-based Dueling DQN routing, the convergence speed is significantly improved, and the convergence effect is more stable. Simultaneously, its consumption of hardware storage space is reduced, and efficient routing decisions can be made in real-time using the current network state information.
翻译:针对无线网络拓扑高度动态变化导致的网络状态信息获取效率低、数据转发灵活性差等挑战,本文提出一种基于近端策略优化深度强化学习与网络态势感知的智能路由算法(DRL-PPONSA)。首先,在软件定义无线网络架构下设计专用数据平面用于网络拓扑构建与数据转发,控制平面通过收集网络流量信息、下发流表,并采用GCN-GRU预测机制感知未来流量变化趋势,实现网络态势感知。其次,在知识平面设计基于深度强化学习的数据转发机制:将预测得到的网络流量矩阵与拓扑信息矩阵作为深度强化学习智能体的环境,下一跳邻接节点作为可执行动作,针对不同网络条件设计动作选择策略以实现更智能、灵活、高效的路由控制;基于网络链路信息设计含多种奖励与惩罚机制的奖励函数,并在梯度更新过程中引入重要性采样与梯度裁剪技术以提升收敛速度与稳定性。实验结果表明,DRL-PPONSA在网络吞吐量、时延、丢包率及无线节点距离等指标上均优于传统路由方法;与基于值函数的Dueling DQN路由相比,其收敛速度显著提升且收敛效果更稳定,同时硬件存储空间消耗降低,可基于当前网络状态信息实时做出高效路由决策。