Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach

This paper studies an integrated sensing and communications (ISAC) system for low-altitude economy (LAE), where a ground base station (GBS) provides communication and navigation services for authorized unmanned aerial vehicles (UAVs), while sensing the low-altitude airspace to monitor the unauthorized mobile target. The expected communication sum-rate over a given flight period is maximized by jointly optimizing the beamforming at the GBS and UAVs' trajectories, subject to the constraints on the average signal-to-noise ratio requirement for sensing, the flight mission and collision avoidance of UAVs, as well as the maximum transmit power at the GBS. Typically, this is a sequential decision-making problem with the given flight mission. Thus, we transform it to a specific Markov decision process (MDP) model called episode task. Based on this modeling, we propose a novel LAE-oriented ISAC scheme, referred to as Deep LAE-ISAC (DeepLSC), by leveraging the deep reinforcement learning (DRL) technique. In DeepLSC, a reward function and a new action selection policy termed constrained noise-exploration policy are judiciously designed to fulfill various constraints. To enable efficient learning in episode tasks, we develop a hierarchical experience replay mechanism, where the gist is to employ all experiences generated within each episode to jointly train the neural network. Besides, to enhance the convergence speed of DeepLSC, a symmetric experience augmentation mechanism, which simultaneously permutes the indexes of all variables to enrich available experience sets, is proposed. Simulation results demonstrate that compared with benchmarks, DeepLSC yields a higher sum-rate while meeting the preset constraints, achieves faster convergence, and is more robust against different settings.

翻译：本文研究面向低空经济的集成感知与通信系统，其中地面基站为授权无人机提供通信与导航服务，同时感知低空空域以监测未授权移动目标。通过联合优化地面基站的波束成形与无人机轨迹，在满足感知平均信噪比要求、无人机飞行任务与防撞约束以及地面基站最大发射功率限制的条件下，最大化给定飞行时段内的期望通信总速率。该问题通常是一个具有给定飞行任务的序列决策问题。为此，我们将其转化为一种称为"片段任务"的特定马尔可夫决策过程模型。基于此建模，我们利用深度强化学习技术，提出了一种新颖的面向低空经济的集成感知与通信方案，称为Deep LAE-ISAC。在DeepLSC中，我们精心设计了奖励函数和一种称为约束噪声探索策略的新动作选择策略，以满足各类约束条件。为实现片段任务中的高效学习，我们开发了一种分层经验回放机制，其核心思想是利用每个片段内生成的全部经验联合训练神经网络。此外，为提升DeepLSC的收敛速度，我们提出了一种对称经验增强机制，通过同步置换所有变量的索引来丰富可用经验集。仿真结果表明，与基准方法相比，DeepLSC在满足预设约束的同时实现了更高的总速率，获得了更快的收敛速度，并且对不同设置具有更强的鲁棒性。