Reinforcement Learning for Minimizing Age of Information in Real-time Internet of Things Systems with Realistic Physical Dynamics

In this paper, the problem of minimizing the weighted sum of age of information (AoI) and total energy consumption of Internet of Things (IoT) devices is studied. In the considered model, each IoT device monitors a physical process that follows nonlinear dynamics. As the dynamics of the physical process vary over time, each device must find an optimal sampling frequency to sample the real-time dynamics of the physical system and send sampled information to a base station (BS). Due to limited wireless resources, the BS can only select a subset of devices to transmit their sampled information. Meanwhile, changing the sampling frequency will also impact the energy used by each device for sampling and information transmission. Thus, it is necessary to jointly optimize the sampling policy of each device and the device selection scheme of the BS so as to accurately monitor the dynamics of the physical process using minimum energy. This problem is formulated as an optimization problem whose goal is to minimize the weighted sum of AoI cost and energy consumption. To solve this problem, a distributed reinforcement learning approach is proposed to optimize the sampling policy. The proposed learning method enables the IoT devices to find the optimal sampling policy using their local observations. Given the sampling policy, the device selection scheme can be optimized so as to minimize the weighted sum of AoI and energy consumption of all devices. Simulations with real data of PM 2.5 pollution show that the proposed algorithm can reduce the sum of AoI by up to 17.8% and 33.9% and the total energy consumption by up to 13.2% and 35.1%, compared to a conventional deep Q network method and a uniform sampling policy.

翻译：在本文中,将信息年龄(AoI)的加权总和和和互联网物质(IoT)设备总能量消耗总量的最小化问题正在研究之中。在所考虑的模型中,每个IoT设备都监测非线性动态的物理过程。随着物理过程的动态变化,每个装置必须找到最佳采样频率来抽样物理系统的实时动态,并将抽样信息发送基站(BS)。由于无线资源有限,BS只能选择一组设备来传输其抽样的深层信息。与此同时,改变取样频率也会影响每个装置用于取样和信息传输的能量。因此,每个IoT设备都监测非线性动态的物理过程。由于物理过程的动态随时间变化而变化,每个装置必须找到最佳的采样频率,以便用最起码的能源过程来准确监测物理过程的动态。这个问题被表述为一个优化的问题,目标是最大限度地减少AoI成本和能源消耗的加权总和。为了解决这个问题,建议一种分散的学习方法来优化取样政策。拟议的学习方法使IoToT设备能够通过最佳的消费总为17%进行最佳的消耗政策,从而用当地的精度选择。