The age of information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on a message exchange between the devices and the base station (BS) before communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving, and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of IoT devices based on Markovian events. The learning proceeds to optimize the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: the forward algorithm (FA) and the long short-term memory (LSTM). Afterward, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.
翻译:信息年龄(AoI)用于衡量数据的新鲜度。在物联网网络中,传统资源管理方案依赖设备与基站(BS)在通信前进行消息交互,这会导致AoI较高、能耗高且可靠性低。作为飞行基站的无人机(UAV)在最小化AoI、节能和提升吞吐量方面具有诸多优势。本文提出一种新颖的基于学习的框架,该框架基于马尔可夫事件估计物联网设备的流量到达情况,进而优化多架UAV的轨迹及其调度策略。首先,BS预测设备未来流量,我们比较了两种流量预测器:前向算法(FA)和长短期记忆网络(LSTM)。随后,提出一种深度强化学习(DRL)方法来优化每架UAV的最优策略。最后,针对所提出的DRL方法构建最优奖励函数。仿真结果表明,所提算法在AoI、调度准确性和传输功率方面均优于随机游走(RW)基线模型。