AoI-based Scheduling of Correlated Sources for Timely Inference

We investigate a real-time remote inference system where multiple correlated sources transmit observations over a communication channel to a receiver. The receiver utilizes these observations to infer multiple time-varying targets. Due to limited communication resources, the delivered observations may not be fresh. To quantify data freshness, we employ the Age of Information (AoI) metric. To minimize the inference error, we aim to design a signal-agnostic scheduling policy that leverages AoI without requiring knowledge of the actual target values or the source observations. This scheduling problem is a restless multi-armed bandit (RMAB) problem with a non-separable penalty function. Unlike traditional RMABs, the correlation among sources introduces a unique challenge: the penalty function of each source depends on the AoI of other correlated sources, preventing the problem from decomposing into multiple independent Markov Decision Processes (MDPs), a key step in applying traditional RMAB solutions. To address this, we propose a novel approach that approximates the penalty function for each source and establishes an analytical bound on the approximation error. We then develop scheduling policies for two scenarios: (i) full knowledge of the penalty functions and (ii) no knowledge of the penalty functions. For the case of known penalty functions, we present an upper bound on the optimality gap that highlights the impact of the correlation parameter and the system size. For the case of unknown penalty functions and signal distributions, we develop an online learning approach that utilizes bandit feedback to learn an online Maximum Gain First policy. Simulation results demonstrate the effectiveness of our proposed policies in minimizing inference error and achieving scalability in the number of sources.

翻译：我们研究一个实时远程推理系统，其中多个关联源通过通信信道向接收器传输观测数据。接收器利用这些观测数据来推断多个时变目标。由于通信资源有限，传输的观测数据可能不具备新鲜性。为量化数据新鲜度，我们采用信息年龄（AoI）度量指标。为最小化推理误差，我们的目标是设计一种信号无关的调度策略，该策略利用AoI而无需了解实际目标值或源观测数据。该调度问题是一个具有不可分离惩罚函数的躁动多臂老虎机（RMAB）问题。与传统RMAB不同，源之间的关联性带来一个独特挑战：每个源的惩罚函数依赖于其他关联源的AoI，导致问题无法分解为多个独立的马尔可夫决策过程（MDPs），而这是应用传统RMAB解决方案的关键步骤。为解决此问题，我们提出一种新方法，该方法近似每个源的惩罚函数，并建立近似误差的解析界。随后，我们针对两种场景开发调度策略：（i）已知惩罚函数完整信息；（ii）未知惩罚函数信息。对于已知惩罚函数的情况，我们给出最优性间隙的上界，该上界揭示了关联参数和系统规模的影响。对于未知惩罚函数和信号分布的情况，我们开发一种在线学习方法，利用老虎机反馈来学习在线最大增益优先策略。仿真结果表明，我们提出的策略在最小化推理误差和实现源数量可扩展性方面具有显著效果。