This paper investigates goal-oriented communication for remote estimation of multiple Markov sources in resource-constrained networks. An agent decides the updating times of the sources and transmits the packet to a remote destination over an unreliable channel with delay. The destination is tasked with source reconstruction for actuation. We utilize the metric \textit{cost of actuation error} (CAE) to capture the state-dependent actuation costs. We aim for a sampling policy that minimizes the long-term average CAE subject to an average resource constraint. We formulate this problem as an average-cost constrained Markov Decision Process (CMDP) and relax it into an unconstrained problem by utilizing \textit{Lyapunov drift} techniques. Then, we propose a low-complexity \textit{drift-plus-penalty} (DPP) policy for systems with known source/channel statistics and a Lyapunov optimization-based deep reinforcement learning (LO-DRL) policy for unknown environments. Our policies significantly reduce the number of uninformative transmissions by exploiting the timing of the important information.
翻译:本文研究资源受限网络中用于多马尔可夫源远程估计的目标导向通信问题。智能体决定各源的更新时刻,并通过具有延迟的不可靠信道将数据包传输至远程目的地。目的地的任务是为执行机构进行源重构。我们采用\textit{执行误差代价}(CAE)这一度量来捕捉状态相关的执行成本。我们的目标是设计一种采样策略,在平均资源约束下最小化长期平均CAE。我们将该问题建模为平均成本约束马尔可夫决策过程(CMDP),并利用\textit{李雅普诺夫漂移}技术将其松弛为无约束问题。随后,我们针对已知源/信道统计特性的系统提出了一种低复杂度的\textit{漂移加惩罚}(DPP)策略,并针对未知环境提出了一种基于李雅普诺夫优化的深度强化学习(LO-DRL)策略。我们的策略通过利用重要信息的时序特性,显著减少了非信息性传输的数量。