This paper investigates goal-oriented communication for remote estimation of multiple Markov sources in resource-constrained networks. An agent selects the update order of the sources and transmits the packet to a remote destination over an unreliable delay channel. The destination is tasked with source reconstruction for the purpose of actuation. We utilize the metric cost of actuation error (CAE) to capture the significance (semantics) of error at the point of actuation. We aim to find an optimal sampling policy that minimizes the time-averaged CAE subject to average resource constraints. We formulate this problem as an average-cost constrained Markov Decision Process (CMDP) and transform it into an unconstrained MDP by utilizing Lyapunov drift techniques. Then, we propose a low-complexity drift-plus-penalty(DPP) policy for systems with known source/channel statistics and a Lyapunov optimization-based deep reinforcement learning (LO-DRL) policy for unknown environments. Our policies achieve near-optimal performance in CAE minimization and significantly reduce the number of uninformative transmissions.
翻译:本文研究了在资源受限网络中,面向目标的通信以远程估计多个马尔可夫源的问题。一个智能体决定源的更新顺序,并通过不可靠的延迟信道将数据包传输到远程目的地。目的地负责源重构以用于执行任务。我们利用执行误差成本(CAE)指标来捕捉在执行点处误差的意义(语义)。我们的目标是找到一种最优采样策略,在满足平均资源约束的条件下,最小化时间平均的CAE。我们将此问题建模为平均成本约束的马尔可夫决策过程(CMDP),并利用李雅普诺夫漂移技术将其转化为无约束的MDP。然后,针对源/信道统计已知的系统,我们提出了一种低复杂度的漂移加惩罚(DPP)策略;对于未知环境,提出了一种基于李雅普诺夫优化的深度强化学习(LO-DRL)策略。我们的策略在CAE最小化方面实现了接近最优的性能,并显著减少了无信息传输的数量。