For status update systems operating over unreliable energy-constrained wireless channels, we address Weaver's long-standing Level-C question: do my packets actually improve the plant's behavior? Each fresh sample carries a stochastic expiration time -- governed by the plant's instability dynamics -- after which the information becomes useless for control. Casting the problem as a coupon-collector variant with expiring coupons, we (i) formulate a two-dimensional average-reward MDP, (ii) prove that the optimal schedule is doubly thresholded in the receiver's freshness timer and the sender's stored lifetime, (iii) derive a closed-form policy for deterministic lifetimes, and (iv) design a Structure-Aware Q-learning algorithm (SAQ) that learns the optimal policy without knowing the channel success probability or lifetime distribution. Simulations validate our theoretical predictions: SAQ matches optimal Value Iteration performance while converging significantly faster than baseline Q-learning, and expiration-aware scheduling achieves up to 50% higher reward than age-based baselines by adapting transmissions to state-dependent urgency -- thereby delivering Level-C effectiveness under tight resource constraints.
翻译:针对在不可靠且能量受限的无线信道上运行的状态更新系统,我们探讨了韦弗长期悬而未决的C级问题:我的数据包是否真正改善了被控系统的行为?每个新鲜样本都携带着由系统不稳定性动态决定的随机过期时间——一旦超过该时间,信息便对控制失效。通过将问题建模为含过期优惠券的优惠券收集变体,我们:(i) 构建了一个二维平均奖励马尔可夫决策过程;(ii) 证明了最优调度策略在接收端新鲜度定时器与发送端存储寿命上具有双重阈值结构;(iii) 针对确定性寿命推导出闭式策略;(iv) 设计了一种结构感知Q学习算法,该算法无需知晓信道成功概率或寿命分布即可学习最优策略。仿真验证了理论预测:SAQ在匹配最优值迭代性能的同时,收敛速度显著快于基准Q学习;相较于基于年龄的基准方法,过期感知调度通过根据状态相关紧迫性调整传输,在严格资源约束下实现了高达50%的奖励提升——从而实现了资源受限条件下的C级有效性。