We consider a status update system consisting of a sampler, a sink, and a controller located at the sink. The controller sends requests to the sampler to generate and transmit status updates. Packet transmissions from the controller to the sampler (reverse link) and from the sampler to the sink (forward link) experience random delays. The reverse and forward links are modeled as servers with geometric service times, referred to as the controller and sampler servers, respectively. Each server is equipped with a single buffer that stores an arriving packet when the server is busy. We adopt a preemption-in-waiting policy on both links, whereby an arriving packet replaces the packet in the buffer whenever the buffer is full. Our main goal is to determine the optimal generation times of request packets at the controller in order to minimize the long-term average age of information (AoI) at the sink. We formulate the problem as a Markov decision process (MDP) and derive the optimal stationary deterministic policy using the relative value iteration (RVI) algorithm. We prove the convergence of the algorithm. Numerical results show that the proposed system consistently outperforms baseline policies from prior work and reveal a threshold-based structure for the optimal policy.
翻译:我们考虑一个由采样器、接收端和位于接收端的控制器组成的状态更新系统。控制器向采样器发送请求,以生成并传输状态更新。从控制器到采样器(反向链路)以及从采样器到接收端(正向链路)的数据包传输经历随机时延。反向链路和正向链路被建模为具有几何服务时间的服务器,分别称为控制器服务器和采样器服务器。每个服务器配备一个单缓冲区,当服务器繁忙时,该缓冲区可存储到达的数据包。我们在两条链路上采用等待时优先抢占策略,即当缓冲区已满时,到达的数据包将替换缓冲区中的现有数据包。我们的主要目标是确定控制器处请求数据包的最优生成时间,以最小化接收端的长期平均信息年龄(AoI)。我们将该问题建模为马尔可夫决策过程(MDP),并利用相对值迭代(RVI)算法推导出最优的平稳确定性策略。我们证明了该算法的收敛性。数值结果表明,所提系统始终优于先前工作中的基线策略,并揭示了最优策略中的阈值结构。