We consider the problem of optimizing the decisions of a preemptively capable transmitter to minimize the Age of Incorrect Information (AoII) when the communication channel has a random delay. In the system, a transmitter observes a Markovian source and makes decisions based on the system status. Time is slotted and normalized. In each time slot, the transmitter decides whether to preempt or skip when the channel is busy. When the channel is idle, the transmitter decides whether to send a new update. At the other end of the channel is a receiver that estimates the state of the Markovian source based on the update it receives. We consider a generic transmission delay and assume that the transmission delay is independent and identically distributed for each update. This paper aims to optimize the transmitter's decision in each time slot to minimize the AoII with generic time penalty functions. To this end, we first use the Markov decision process to formulate the optimization problem and derive the analytical expressions of the expected AoIIs achieved by two canonical preemptive policies. Then, we prove the existence of the optimal policy and provide a feasible value iteration algorithm to approximate the optimal policy. However, the value iteration algorithm will be computationally expensive if we want considerable confidence in the approximation. Therefore, we analyze the system characteristics under two canonical delay distributions and theoretically obtain the corresponding optimal policies using the policy improvement theorem. Finally, numerical results are presented to illustrate the performance improvements brought about by the preemption capability.
翻译:本文考虑了在通信信道存在随机延迟时,优化具备抢占能力的发射机决策以最小化错误信息年龄(AoII)的问题。在该系统中,发射机观测马尔可夫信源,并根据系统状态做出决策。时间被划分为标准化时隙。在每个时隙中,当信道忙时,发射机决定是否抢占或跳过;当信道空闲时,发射机决定是否发送新更新。信道另一端为接收机,其根据接收到的更新估计马尔可夫信源的状态。我们考虑一般的传输延迟,并假设每次更新的传输延迟独立同分布。本文旨在优化发射机在每个时隙的决策,以在通用时间惩罚函数下最小化AoII。为此,我们首先利用马尔可夫决策过程对优化问题进行建模,推导两种典型抢占策略下预期AoII的解析表达式。随后,我们证明最优策略的存在性,并提供一种可行的值迭代算法来逼近最优策略。然而,若需获得较高近似置信度,值迭代算法的计算开销将十分巨大。因此,我们分析了两种典型延迟分布下的系统特性,并利用策略改进定理在理论上获得了对应的最优策略。最后,通过数值结果展示了抢占能力带来的性能提升。