Differentiated Federated Reinforcement Learning Based Traffic Offloading on Space-Air-Ground Integrated Networks

The Space-Air-Ground Integrated Network (SAGIN) plays a pivotal role as a comprehensive foundational network communication infrastructure, presenting opportunities for highly efficient global data transmission. Nonetheless, given SAGIN's unique characteristics as a dynamically heterogeneous network, conventional network optimization methodologies encounter challenges in satisfying the stringent requirements for network latency and stability inherent to data transmission within this network environment. Therefore, this paper proposes the use of differentiated federated reinforcement learning (DFRL) to solve the traffic offloading problem in SAGIN, i.e., using multiple agents to generate differentiated traffic offloading policies. Considering the differentiated characteristics of each region of SAGIN, DFRL models the traffic offloading policy optimization process as the process of solving the Decentralized Partially Observable Markov Decision Process (DEC-POMDP) problem. The paper proposes a novel Differentiated Federated Soft Actor-Critic (DFSAC) algorithm to solve the problem. The DFSAC algorithm takes the network packet delay as the joint reward value and introduces the global trend model as the joint target action-value function of each agent to guide the update of each agent's policy. The simulation results demonstrate that the traffic offloading policy based on the DFSAC algorithm achieves better performance in terms of network throughput, packet loss rate, and packet delay compared to the traditional federated reinforcement learning approach and other baseline approaches.

翻译：空天地一体化网络（SAGIN）作为一项综合性的基础网络通信基础设施，扮演着关键角色，为高效的全球数据传输提供了机遇。然而，鉴于SAGIN作为一种动态异构网络的独特特性，传统的网络优化方法在满足该网络环境中数据传输所固有的严格网络延迟和稳定性要求方面面临挑战。因此，本文提出使用差异化联邦强化学习（DFRL）来解决SAGIN中的流量卸载问题，即使用多个智能体生成差异化的流量卸载策略。考虑到SAGIN各区域的差异化特性，DFRL将流量卸载策略优化过程建模为解决去中心化部分可观测马尔可夫决策过程（DEC-POMDP）问题的过程。本文提出了一种新颖的差异化联邦软演员-评论家（DFSAC）算法来解决该问题。DFSAC算法以网络数据包延迟作为联合奖励值，并引入全局趋势模型作为每个智能体的联合目标动作-价值函数，以指导每个智能体策略的更新。仿真结果表明，与传统联邦强化学习方法及其他基线方法相比，基于DFSAC算法的流量卸载策略在网络吞吐量、丢包率和数据包延迟方面均取得了更优的性能。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日