Age of incorrect information (AoII) is a recently proposed freshness and mismatch metric that penalizes an incorrect estimation along with its duration. Therefore, keeping track of AoII requires the knowledge of both the source and estimation processes. In this paper, we consider a time-slotted pull-based remote estimation system under a sampling rate constraint where the information source is a general discrete-time Markov chain (DTMC) process. Moreover, packet transmission times from the source to the monitor are non-zero which disallows the monitor to have perfect information on the actual AoII process at any time. Hence, for this pull-based system, we propose the monitor to maintain a sufficient statistic called {\em belief} which stands for the joint distribution of the age and source processes to be obtained from the history of all observations. Using belief, we first propose a maximum a posteriori (MAP) estimator to be used at the monitor as opposed to existing martingale estimators in the literature. Second, we obtain the optimality equations from the belief-MDP (Markov decision process) formulation. Finally, we propose two belief-dependent policies one of which is based on deep reinforcement learning, and the other one is a threshold-based policy based on the instantaneous expected AoII.
翻译:错误信息年龄(AoII)是近期提出的一种衡量信息新鲜度与失配程度的指标,它同时对错误估计及其持续时间施加惩罚。因此,跟踪AoII需要同时掌握信源过程与估计过程的信息。本文研究一种在采样率约束下的时隙化拉取式远程估计系统,其中信息源为一般离散时间马尔可夫链(DTMC)过程。此外,从信源到监控器的数据包传输时间非零,这导致监控器无法在任何时刻获得实际AoII过程的完整信息。因此,针对该拉取式系统,我们提出监控器应维护一种称为“信念”的充分统计量,该统计量表示基于所有观测历史所得的年龄过程与信源过程的联合分布。利用信念,我们首先提出一种在监控器端使用的最大后验概率(MAP)估计器,以区别于文献中现有的鞅估计器。其次,我们从信念-马尔可夫决策过程(MDP)的建模中推导出最优性方程。最后,我们提出两种基于信念的策略:一种基于深度强化学习,另一种是基于瞬时期望AoII的阈值策略。