In this paper, a novel joint energy and age of information (AoI) optimization framework for IoT devices in a non-stationary environment is presented. In particular, IoT devices that are distributed in the real-world are required to efficiently utilize their computing resources so as to balance the freshness of their data and their energy consumption. To optimize the performance of IoT devices in such a dynamic setting, a novel lifelong reinforcement learning (RL) solution that enables IoT devices to continuously adapt their policies to each newly encountered environment is proposed. Given that IoT devices have limited energy and computing resources, an unmanned aerial vehicle (UAV) is leveraged to visit the IoT devices and update the policy of each device sequentially. As such, the UAV is exploited as a mobile learning agent that can learn a shared knowledge base with a feature base in its training phase, and feature sets of a zero-shot learning method in its testing phase, to generalize between the environments. To optimize the trajectory and flying velocity of the UAV, an actor-critic network is leveraged so as to minimize the UAV energy consumption. Simulation results show that the proposed lifelong RL solution can outperform the state-of-art benchmarks by enhancing the balanced cost of IoT devices by $8.3\%$ when incorporating warm-start policies for unseen environments. In addition, our solution achieves up to $49.38\%$ reduction in terms of energy consumption by the UAV in comparison to the random flying strategy.
翻译:本文针对非平稳环境下的物联网设备,提出了一种新颖的联合能量与信息年龄(AoI)优化框架。特别是,分布在真实环境中的物联网设备需要高效利用其计算资源,以平衡数据新鲜度与能耗。为优化此类动态场景下物联网设备的性能,本文提出了一种基于持续强化学习(RL)的解决方案,使物联网设备能够不断调整其策略以适应每个新遇到的环境。考虑到物联网设备能量与计算资源有限,本文利用无人机(UAV)依次访问各设备并更新其策略。因此,无人机被用作移动学习代理:在其训练阶段学习包含特征基的共享知识库,并在测试阶段利用零样本学习方法的特征集,以在不同环境间进行泛化。为优化无人机的轨迹与飞行速度,采用行动者-评论家网络以最小化无人机能耗。仿真结果表明,所提出的持续强化学习方案在未见环境中采用暖启动策略时,可将物联网设备的均衡成本提升8.3%,从而超越现有基准方法。此外,与随机飞行策略相比,该方案使无人机能耗降低高达49.38%。