We propose a deep reinforcement learning (DRL) approach for a full-duplex (FD) transmission that predicts the phase shifts of the reconfigurable intelligent surface (RIS), base station (BS) active beamformers, and the transmit powers to maximize the weighted sum rate of uplink and downlink users. Existing methods require channel state information (CSI) and residual self-interference (SI) knowledge to calculate exact active beamformers or the DRL rewards, which typically fail without CSI or residual SI. Especially for time-varying channels, estimating and signaling CSI to the DRL agent is required at each time step and is costly. We propose a two-stage DRL framework with minimal signaling overhead to address this. The first stage uses the least squares method to initiate learning by partially canceling the residual SI. The second stage uses DRL to achieve performance comparable to existing CSI-based methods without requiring the CSI or the exact residual SI. Further, the proposed DRL framework for quantized RIS phase shifts reduces the signaling from BS to the RISs using $32$ times fewer bits than the continuous version. The quantized methods reduce action space, resulting in faster convergence and $7.1\%$ and $22.28\%$ better UL and DL rates, respectively than the continuous method.
翻译:本文提出了一种用于全双工传输的深度强化学习方法,该方法可预测可重构智能表面的相移、基站有源波束成形器以及发射功率,以最大化上行与下行用户的加权和速率。现有方法需要信道状态信息和残余自干扰知识来计算精确的有源波束成形器或DRL奖励,若缺乏CSI或残余SI则通常失效。尤其在时变信道下,每个时间步都需要估计CSI并向DRL智能体传输,成本高昂。为此,我们提出了一种具有最小信令开销的两阶段DRL框架。第一阶段采用最小二乘法通过部分消除残余SI来初始化学习过程。第二阶段使用DRL在无需CSI或精确残余SI的情况下,实现与现有基于CSI方法相当的性能。此外,所提出的针对量化RIS相移的DRL框架,相比连续版本减少了基站到RIS的信令传输,比特数降低了32倍。量化方法缩小了动作空间,从而实现了更快的收敛速度,其上、下行速率分别比连续方法提高了7.1%和22.28%。