面向低空经济的集成感知与通信：一种深度强化学习方法 (Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach)

This paper studies an integrated sensing and communications (ISAC) system for low-altitude economy (LAE), where a ground base station (GBS) provides communication and navigation services for authorized unmanned aerial vehicles (UAVs), while sensing the low-altitude airspace to monitor the unauthorized mobile target. The expected communication sum-rate over a given flight period is maximized by jointly optimizing the beamforming at the GBS and UAVs' trajectories, subject to the constraints on the average signal-to-noise ratio requirement for sensing, the flight mission and collision avoidance of UAVs, as well as the maximum transmit power at the GBS. Typically, this is a sequential decision-making problem with the given flight mission. Thus, we transform it to a specific Markov decision process (MDP) model called episode task. Based on this modeling, we propose a novel LAE-oriented ISAC scheme, referred to as Deep LAE-ISAC (DeepLSC), by leveraging the deep reinforcement learning (DRL) technique. In DeepLSC, a reward function and a new action selection policy termed constrained noise-exploration policy are judiciously designed to fulfill various constraints. To enable efficient learning in episode tasks, we develop a hierarchical experience replay mechanism, where the gist is to employ all experiences generated within each episode to jointly train the neural network. Besides, to enhance the convergence speed of DeepLSC, a symmetric experience augmentation mechanism, which simultaneously permutes the indexes of all variables to enrich available experience sets, is proposed. Simulation results demonstrate that compared with benchmarks, DeepLSC yields a higher sum-rate while meeting the preset constraints, achieves faster convergence, and is more robust against different settings.

翻译：本文研究面向低空经济（LAE）的集成感知与通信（ISAC）系统，其中地面基站（GBS）为授权无人机（UAV）提供通信与导航服务，同时感知低空空域以监测未授权的移动目标。通过联合优化GBS的波束成形与无人机的飞行轨迹，在满足感知所需的平均信噪比要求、无人机的飞行任务与防撞约束以及GBS的最大发射功率约束下，最大化给定飞行时段内的期望通信总速率。该问题通常是一个具有给定飞行任务的序贯决策问题。因此，我们将其转化为一种特定的马尔可夫决策过程（MDP）模型，称为片段任务。基于此建模，我们利用深度强化学习（DRL）技术，提出了一种新颖的面向LAE的ISAC方案，称为Deep LAE-ISAC（DeepLSC）。在DeepLSC中，精心设计了奖励函数和一种称为约束噪声探索策略的新动作选择策略，以满足各类约束。为实现片段任务中的高效学习，我们开发了一种分层经验回放机制，其核心在于利用每个片段内生成的所有经验联合训练神经网络。此外，为提升DeepLSC的收敛速度，提出了一种对称经验增强机制，通过同时置换所有变量的索引来丰富可用经验集。仿真结果表明，与基准方案相比，DeepLSC在满足预设约束的同时获得了更高的总速率，实现了更快的收敛速度，并且对不同设置具有更强的鲁棒性。