The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.
翻译:卫星通信网络与下一代技术的融合是实现全球连接的一种前景广阔的方法。然而,服务质量高度依赖于准确信道状态信息的可用性。卫星通信中的信道估计具有挑战性,这是因为地面用户与卫星之间的高传播延迟,导致卫星端观测到的信道状态信息是过时的。本文研究了多颗卫星作为分布式基站向移动地面用户进行下行传输的场景。我们提出了一种多智能体强化学习算法,旨在最大化用户的总和速率,同时应对过时的信道状态信息。我们设计了一种新颖的双层优化过程,称为双阶段近端策略优化,以解决多智能体强化学习中大型连续动作空间以及独立且非同分布环境的问题。具体而言,DS-PPO的第一阶段最大化单个卫星的和速率,第二阶段则在所有卫星协作形成一个分布式多天线基站时最大化总和速率。我们的数值结果证明了DS-PPO对信道状态信息不完善的鲁棒性,以及使用DS-PPO所带来的和速率提升。此外,我们还提供了DS-PPO的收敛性分析及其计算复杂度。