In this letter, we investigate the discrete phase shift design of the intelligent reflecting surface (IRS) in a time division duplexing (TDD) multi-user multiple input multiple output (MIMO) system.We modify the design of deep reinforcement learning (DRL) scheme so that we can maximizing the average downlink data transmission rate free from the sub-channel channel state information (CSI). Based on the characteristics of the model, we modify the proximal policy optimization (PPO) algorithm and integrate gated recurrent unit (GRU) to tackle the non-convex optimization problem. Simulation results show that the performance of the proposed PPO-GRU surpasses the benchmarks in terms of performance, convergence speed, and training stability.
翻译:本文研究了时分双工多用户多输入多输出系统中智能反射面的离散相位设计问题。我们对深度强化学习方案进行了改进设计,使得在无子信道状态信息的情况下能够最大化平均下行数据传输速率。基于模型特性,我们改进了近端策略优化算法并融合门控循环单元以处理非凸优化问题。仿真结果表明,所提出的PPO-GRU算法在性能、收敛速度和训练稳定性方面均优于基准方案。