Internet of Things (IoT) networks have become ubiquitous as autonomous computing, communication and collaboration among devices become popular for accomplishing various tasks. The use of relays in IoT networks further makes it convenient to deploy IoT networks as relays provide a host of benefits, like increasing the communication range and minimizing power consumption. Existing literature on traditional AoI schedulers for such two-hop relayed IoT networks are limited because they are designed assuming constant/non-changing channel conditions and known (usually, generate-at-will) packet generation patterns. Deep reinforcement learning (DRL) algorithms have been investigated for AoI scheduling in two-hop IoT networks with relays, however, they are only applicable for small-scale IoT networks due to exponential rise in action space as the networks become large. These limitations discourage the practical utilization of AoI schedulers for IoT network deployments. This paper presents a practical AoI scheduler for two-hop IoT networks with relays that addresses the above limitations. The proposed scheduler utilizes a novel voting mechanism based proximal policy optimization (v-PPO) algorithm that maintains a linear action space, enabling it be scale well with larger IoT networks. The proposed v-PPO based AoI scheduler adapts well to changing network conditions and accounts for unknown traffic generation patterns, making it practical for real-world IoT deployments. Simulation results show that the proposed v-PPO based AoI scheduler outperforms both ML and traditional (non-ML) AoI schedulers, such as, Deep Q Network (DQN)-based AoI Scheduler, Maximal Age First-Maximal Age Difference (MAF-MAD), MAF (Maximal Age First) , and round-robin in all considered practical scenarios.
翻译:物联网网络因设备间自主计算、通信与协作在完成各类任务中的广泛应用而变得无处不在。在中继物联网网络中引入中继节点可扩大通信范围、降低功耗等优势,进一步便利了网络部署。现有针对此类双跳中继物联网网络的传统AoI调度器研究存在局限,因为它们通常假设恒定的信道条件和已知的(通常为按需生成)数据包生成模式。深度强化学习算法已被研究用于解决带中继的双跳物联网网络中的AoI调度问题,但由于动作空间随网络规模呈指数增长,这些算法仅适用于小规模物联网网络。这些局限性阻碍了AoI调度器在物联网网络部署中的实际应用。本文提出一种针对带中继的双跳物联网网络的实用AoI调度器,解决了上述局限。所提调度器采用基于新型投票机制的近端策略优化算法,该算法保持线性动作空间,从而能够适应大规模物联网网络的扩展。基于v-PPO的AoI调度器能够很好地适应变化的网络条件,并考虑未知的流量生成模式,使其适用于实际物联网部署场景。仿真结果表明,在考虑的所有实际场景中,所提v-PPO的AoI调度器在性能上优于基于深度Q网络的AoI调度器、最大年龄优先-最大年龄差(MAF-MAD)、最大年龄优先(MAF)以及轮询等机器学习与传统(非机器学习)AoI调度器。