This paper studies a continuous-time joint sampling-and-preemption problem, incorporating sampling and preemption penalties under general service-time distributions. We formulate the system as an impulse-controlled piecewise-deterministic Markov process (PDMP) and derive coupled integral average-cost optimality equations via the dynamic programming principle, thereby avoiding the smoothness assumptions typically required for an average-cost Hamilton-Jacobi-Bellman quasi-variational inequality (HJB-QVI) characterization. A key invariance in the busy phase collapses the dynamics onto a one-dimensional busy-start boundary, reducing preemption control to an optimal stopping problem. Building on this structure, we develop an efficient policy iteration algorithm with heavy-tail acceleration, employing a hybrid (uniform/log-spaced) action grid and a far-field linear closure. Simulations under Pareto and log-normal service times demonstrate substantial improvements over AoI-optimal non-preemptive sampling and zero-wait baselines, achieving up to a 30x reduction in average cost in heavy-tailed regimes. Finally, simulations uncover a counterintuitive insight: under preemption, delay variance, despite typically being a liability, can become a strategic advantage for information freshness.
翻译:本文研究一个连续时间联合采样与抢占问题,该问题在一般服务时间分布下同时考虑了采样与抢占惩罚。我们将系统建模为脉冲控制的分段确定性马尔可夫过程(PDMP),并基于动态规划原理推导出耦合的积分型平均代价最优性方程,从而避免了通常为建立平均代价哈密顿-雅可比-贝尔曼拟变分不等式(HJB-QVI)表征所需的光滑性假设。繁忙阶段的一个关键不变性将动态过程压缩至一维繁忙起始边界,从而将抢占控制简化为一个最优停止问题。基于此结构,我们开发了一种结合重尾加速的高效策略迭代算法,该算法采用混合(均匀/对数间隔)动作网格及远场线性闭包。在帕累托与对数正态服务时间分布下的仿真结果表明,相较于面向信息年龄最优的非抢占式采样基准与零等待基准,该方法取得了显著改进,在重尾区域平均代价降低高达30倍。最后,仿真揭示了一个反直觉的发现:在抢占机制下,延迟方差——尽管通常被视为不利因素——反而可能成为提升信息新鲜度的策略性优势。