The Space-Air-Ground Integrated Network (SAGIN) plays a pivotal role as a comprehensive foundational network communication infrastructure, presenting opportunities for highly efficient global data transmission. Nonetheless, given SAGIN's unique characteristics as a dynamically heterogeneous network, conventional network optimization methodologies encounter challenges in satisfying the stringent requirements for network latency and stability inherent to data transmission within this network environment. Therefore, this paper proposes the use of differentiated federated reinforcement learning (DFRL) to solve the traffic offloading problem in SAGIN, i.e., using multiple agents to generate differentiated traffic offloading policies. Considering the differentiated characteristics of each region of SAGIN, DFRL models the traffic offloading policy optimization process as the process of solving the Decentralized Partially Observable Markov Decision Process (DEC-POMDP) problem. The paper proposes a novel Differentiated Federated Soft Actor-Critic (DFSAC) algorithm to solve the problem. The DFSAC algorithm takes the network packet delay as the joint reward value and introduces the global trend model as the joint target action-value function of each agent to guide the update of each agent's policy. The simulation results demonstrate that the traffic offloading policy based on the DFSAC algorithm achieves better performance in terms of network throughput, packet loss rate, and packet delay compared to the traditional federated reinforcement learning approach and other baseline approaches.
翻译:空天地一体化网络(SAGIN)作为一项综合性的基础网络通信基础设施,扮演着关键角色,为高效的全球数据传输提供了机遇。然而,鉴于SAGIN作为一种动态异构网络的独特特性,传统的网络优化方法在满足该网络环境中数据传输所固有的严格网络延迟和稳定性要求方面面临挑战。因此,本文提出使用差异化联邦强化学习(DFRL)来解决SAGIN中的流量卸载问题,即使用多个智能体生成差异化的流量卸载策略。考虑到SAGIN各区域的差异化特性,DFRL将流量卸载策略优化过程建模为解决去中心化部分可观测马尔可夫决策过程(DEC-POMDP)问题的过程。本文提出了一种新颖的差异化联邦软演员-评论家(DFSAC)算法来解决该问题。DFSAC算法以网络数据包延迟作为联合奖励值,并引入全局趋势模型作为每个智能体的联合目标动作-价值函数,以指导每个智能体策略的更新。仿真结果表明,与传统联邦强化学习方法及其他基线方法相比,基于DFSAC算法的流量卸载策略在网络吞吐量、丢包率和数据包延迟方面均取得了更优的性能。