In this paper, we consider a point-to-point integrated sensing and communication (ISAC) system, where a transmitter conveys a message to a receiver over a channel with memory and simultaneously estimates the state of the channel through the backscattered signals from the emitted waveform. Using Massey's concept of directed information for channels with memory, we formulate the capacity-distortion tradeoff for the ISAC problem when sensing is performed in an online fashion. Optimizing the transmit waveform for this system to simultaneously achieve good communication and sensing performance is a complicated task, and thus we propose a deep reinforcement learning (RL) approach to find a solution. The proposed approach enables the agent to optimize the ISAC performance by learning a reward that reflects the difference between the communication gain and the sensing loss. Since the state-space in our RL model is \`a priori unbounded, we employ deep deterministic policy gradient algorithm (DDPG). Our numerical results suggest a significant performance improvement when one considers unbounded state-space as opposed to a simpler RL problem with reduced state-space. In the extreme case of degenerate state-space only memoryless signaling strategies are possible. Our results thus emphasize the necessity of well exploiting the memory inherent in ISAC systems.
翻译:本文研究一种点对点集成感知与通信系统,其中发射机通过具有记忆特性的信道向接收机传输信息,同时通过发射波形的后向散射信号估计信道状态。利用Massey针对记忆信道提出的定向信息概念,我们建立了在线感知场景下ISAC问题的容量-失真权衡框架。为该系统优化发射波形以同时实现良好的通信与感知性能是一项复杂任务,因此我们提出一种深度强化学习方法寻求解决方案。该方法使智能体能够通过学习反映通信增益与感知损失差异的奖励函数来优化ISAC性能。由于强化学习模型中的状态空间具有先验无界性,我们采用深度确定性策略梯度算法。数值结果表明,与采用简化状态空间的简单强化学习问题相比,考虑无界状态空间能带来显著的性能提升。在状态空间退化的极端情况下,仅能采用无记忆的信令策略。因此,我们的研究结果强调了充分挖掘ISAC系统固有记忆特性的必要性。