Bistatic backscatter communication promises ubiquitous, massive connectivity by utilizing passive tags to connect with a reader by reflecting carrier emitter (CE) signals for future Internet-of-Things (IoT) networks. This study focuses on the joint design of the transmit/received beamformers at the CE/reader and the reflection coefficient of the tag. A throughput maximization problem is thus formulated, subject to satisfying the tag requirements. We develop a joint design through a series of trial-and-error interactions within the environment, driven by a predefined reward system in a continuous state and action context. We propose two deep reinforcement learning (DRL) algorithms to address the underlying optimization problem, namely deep deterministic policy gradient (DDPG) and soft actor-critic (SAC). Simulation results indicate that the proposed algorithm can learn from the environment and incrementally enhance its behavior, achieving performance that is on par with two leading benchmarks. Further, we also compared the performance of the proposed method with deep Q-network (DQN), double deep Q-network (DDQN), and dueling DQN (DuelDQN). For a system with twelve antennas, SAC leads with a 26.76% gain over DQN, followed by alternative optimization (AO) and DDPG at 23.02% and 19.16%. DDQN and DuelDQN show smaller improvements of 10.40% and 14.36%, respectively, against DQN.
翻译:双基地反向散射通信通过利用无源标签反射载波发射器信号与阅读器连接,为未来物联网网络提供了实现泛在、大规模连接的潜力。本研究聚焦于载波发射器/阅读器的发射/接收波束成形器与标签反射系数的联合设计。在满足标签需求的前提下,构建了一个吞吐量最大化问题。我们通过在连续状态与动作空间中,基于预定义奖励系统驱动的一系列试错交互,开发了一种联合设计方案。针对该优化问题,我们提出了两种深度强化学习算法:深度确定性策略梯度与柔性演员-评论家。仿真结果表明,所提算法能够从环境中学习并逐步优化其行为,其性能与两种领先基准方案相当。此外,我们还比较了所提方法与深度Q网络、双深度Q网络及对决深度Q网络的性能。在十二天线系统中,柔性演员-评论家相比深度Q网络取得了26.76%的性能增益,替代优化与深度确定性策略梯度分别获得23.02%和19.16%的增益。双深度Q网络和对决深度Q网络相比深度Q网络则分别实现了10.40%和14.36%的较小改进。