This paper introduces a Multi-Agent Deep Reinforcement Learning (MA-DRL) approach for routing in Low Earth Orbit Satellite Constellations (LSatCs). Each satellite is an independent decision-making agent with a partial knowledge of the environment, and supported by feedback received from the nearby agents. Building on our previous work that introduced a Q-routing solution, the contribution of this paper is to extend it to a deep learning framework able to quickly adapt to the network and traffic changes, and based on two phases: (1) An offline exploration learning phase that relies on a global Deep Neural Network (DNN) to learn the optimal paths at each possible position and congestion level; (2) An online exploitation phase with local, on-board, pre-trained DNNs. Results show that MA-DRL efficiently learns optimal routes offline that are then loaded for an efficient distributed routing online.
翻译:本文提出了一种多智能体深度强化学习(MA-DRL)方法,用于低地球轨道卫星星座(LSatCs)中的路由。每颗卫星都是一个独立的决策智能体,仅掌握局部环境信息,并借助邻近智能体的反馈辅助。基于我们先前引入Q路由解决方案的工作,本文的贡献在于将其扩展为能够快速适应网络和流量变化的深度学习框架,该框架包含两个阶段:(1)离线探索学习阶段,依赖全局深度神经网络(DNN)学习每个可能位置和拥塞级别下的最优路径;(2)在线利用阶段,使用预先训练好的本地星载深度神经网络。结果表明,MA-DRL能够高效地在离线状态下学习最优路由,随后加载这些路由以实现高效的在线分布式路由。