We consider a transmitter-receiver pair in a slotted-time system. The transmitter observes a dynamic source and sends updates to a remote receiver through an error-free communication channel that suffers a random delay. We consider two cases. In the first case, the update is guaranteed to be delivered within a certain number of time slots. In the second case, the update is immediately discarded once the transmission time exceeds a predetermined value. The receiver estimates the state of the dynamic source using the received updates. In this paper, we adopt the Age of Incorrect Information (AoII) as the performance metric and investigate the problem of optimizing the transmitter's action in each time slot to minimize AoII. We first characterize the optimization problem using the Markov decision process and investigate the performance of the threshold policy, under which the transmitter transmits updates only when the transmission is allowed and the AoII exceeds the threshold $\tau$. By delving into the characteristics of the system evolution, we precisely compute the expected AoII achieved by the threshold policy using the Markov chain. Then, we prove that the optimal policy exists and provide a computable relative value iteration algorithm to estimate the optimal policy. Furthermore, by leveraging the policy improvement theorem, we theoretically prove that, under an easily verifiable condition, the optimal policy is the threshold policy with $\tau=1$. Finally, numerical results are presented to highlight the performance of the optimal policy.
翻译:我们考虑一个分时系统中的发射-接收对。发射器观测动态源并通过遭受随机延迟的无差错通信信道向远程接收器发送更新。我们考虑两种情况:第一种情况保证更新在特定时间槽内送达;第二种情况中,一旦传输时间超过预定值,更新立即被丢弃。接收器利用接收到的更新估计动态源的状态。本文采用错误信息年龄(Age of Incorrect Information, AoII)作为性能指标,研究优化发射器每个时间槽动作以最小化AoII的问题。我们首先利用马尔可夫决策过程刻画该优化问题,并研究阈值策略的性能——在该策略下,仅当传输允许且AoII超过阈值τ时,发射器才发送更新。通过深入分析系统演化特征,我们利用马尔可夫链精确计算了阈值策略所实现的期望AoII。随后,我们证明最优策略的存在性,并给出可计算的相对值迭代算法来估计最优策略。进一步地,利用策略改进定理,我们理论证明:在易于验证的条件下,最优策略即为τ=1的阈值策略。最后,通过数值结果展示最优策略的性能。