基于学习的延迟感知与稳定远程状态估计传感器调度 (Learning-Based Sensor Scheduling for Delay-Aware and Stable Remote State Estimation)

Unpredictable sensor-to-estimator delays fundamentally distort what matters for wireless remote state estimation: not just freshness, but how delay interacts with sensor informativeness and energy efficiency. In this paper, we present a unified, delay-aware framework that models this coupling explicitly and quantifies a delay-dependent information gain, motivating an information-per-joule scheduling objective beyond age of information proxies (AoI). To this end, we first introduce an efficient posterior-fusion update that incorporates delayed measurements without state augmentation, providing a consistent approximation to optimal delayed Kalman updates, and then derive tractable stability conditions ensuring that bounded estimation error is achievable under stochastic, delayed scheduling. This conditions highlight the need for unstable modes to be observable across sensors. Building on this foundation, we cast scheduling as a Markov decision process and develop a proximal policy optimization (PPO) scheduler that learns directly from interaction, requires no prior delay model, and explicitly trades off estimation accuracy, freshness, sensor heterogeneity, and transmission energy through normalized rewards. In simulations with heterogeneous sensors, realistic link-energy models, and random delays, the proposed method learns stably and consistently achieves lower estimation error at comparable energy than random scheduling and strong RL baselines (DQN, A2C), while remaining robust to variations in measurement availability and process/measurement noise.

翻译：不可预测的传感器至估计器延迟从根本上改变了无线远程状态估计的关键考量：不仅在于信息的新鲜度，更在于延迟如何与传感器信息价值及能量效率相互作用。本文提出一个统一的延迟感知框架，显式建模这种耦合关系，并量化延迟依赖的信息增益，从而提出一种超越信息年龄代理的信息-每焦耳调度目标。为此，我们首先引入一种高效的后验融合更新方法，在不进行状态增广的情况下融合延迟测量值，为最优延迟卡尔曼更新提供一致近似；随后推导出易于处理的稳定性条件，确保在随机延迟调度下可实现有界估计误差。该条件强调不稳定模态需在传感器间具备可观测性。在此基础上，我们将调度问题建模为马尔可夫决策过程，开发了一种近端策略优化调度器，该调度器通过交互直接学习，无需先验延迟模型，并通过归一化奖励显式权衡估计精度、新鲜度、传感器异构性与传输能耗。在包含异构传感器、实际链路能量模型和随机延迟的仿真中，所提方法学习稳定，在相同能量消耗下持续获得比随机调度及强化学习基线方法更低的估计误差，同时对测量可用性和过程/测量噪声的变化保持鲁棒性。