This paper investigates goal-oriented communication for remote estimation of multiple Markov sources in resource-constrained networks. An agent decides the updating times of the sources and transmits the packet to a remote destination over an unreliable channel with delay. The destination is tasked with source reconstruction for actuation. We utilize the metric \textit{cost of actuation error} (CAE) to capture the state-dependent actuation costs. We aim for a sampling policy that minimizes the long-term average CAE subject to an average resource constraint. We formulate this problem as an average-cost constrained Markov Decision Process (CMDP) and relax it into an unconstrained problem by utilizing \textit{Lyapunov drift} techniques. Then, we propose a low-complexity \textit{drift-plus-penalty} (DPP) policy for systems with known source/channel statistics and a Lyapunov optimization-based deep reinforcement learning (LO-DRL) policy for unknown environments. Our policies significantly reduce the number of uninformative transmissions by exploiting the timing of the important information.
翻译:本文研究了资源约束网络中远程估计多马尔可夫源的目标导向通信问题。智能体决定各源的更新时刻,并通过存在延迟的不可靠信道将数据包传输至远程目的地,该目的地负责根据接收数据进行源重构以执行驱动任务。我们采用度量指标"驱动误差代价"(CAE)来刻画与状态相关的驱动成本,旨在设计一种在平均资源约束下使长期平均CAE最小化的采样策略。将该问题建模为平均成本约束马尔可夫决策过程(CMDP),并利用李雅普诺夫漂移技术将其松弛为无约束问题。针对源/信道统计特性已知的系统,提出低复杂度的"漂移加惩罚"(DPP)策略;针对未知环境,提出基于李雅普诺夫优化的深度强化学习(LO-DRL)策略。通过利用重要信息的时序特性,所提策略显著减少了非信息性传输的数量。