This paper studies semantic-aware communication for remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the semantics of information and consider that the remote actuator has different tolerances for the estimation errors of different states. We aim to find an optimal scheduling policy that minimizes the long-term state-dependent costs of estimation errors under a transmission frequency constraint. We theoretically show the structure of the optimal policy by leveraging the average-cost Constrained Markov Decision Process (CMDP) theory and the Lagrangian dynamic programming. By exploiting the optimal structural results, we develop a novel policy search algorithm, termed intersection search plus relative value iteration (Insec-RVI), that can find the optimal policy using only a few iterations. To avoid the ``curse of dimensionality'' of MDPs, we propose an online low-complexity drift-plus-penalty (DPP) scheduling algorithm based on the Lyapunov optimization theorem. We also design an efficient average-cost Q-learning algorithm to estimate the optimal policy without knowing a priori the channel and source statistics. Numerical results show that continuous transmission is inefficient, and remarkably, our semantic-aware policies can attain the optimum by strategically utilizing fewer transmissions by exploiting the timing of the important information.
翻译:本文研究在存在损耗和速率受限信道条件下,对多个马尔可夫源进行远估计的语义感知通信问题。与大多数现有研究对所有源状态一视同仁不同,我们利用信息的语义特性,并考虑远程执行器对不同状态估计误差具有不同的容忍度。我们的目标是找到一个最优调度策略,在传输频率约束下最小化长期状态相关的估计误差成本。通过利用平均成本约束马尔可夫决策过程理论和拉格朗日动态规划,我们从理论上证明了最优策略的结构。借助最优结构结果,我们开发了一种新颖的策略搜索算法,称为交集搜索加相对值迭代,该算法仅需少量迭代即可找到最优策略。为避免马尔可夫决策过程的“维度灾难”,我们基于李雅普诺夫优化定理提出了一种在线低复杂度的漂移加惩罚调度算法。我们还设计了一种高效的平均成本Q学习算法,用于在未知信道和源统计信息先验知识的情况下估计最优策略。数值结果表明,连续传输效率低下,而值得注意的是,我们的语义感知策略通过利用重要信息的时机,策略性地使用较少的传输次数即可实现最优性能。