Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally design the UAVs' trajectories and the subsets of visited IoT devices such as the global Age-of-Updates (AoU) is minimized. To this end, we formulate the studied problem as a mixed-integer nonlinear programming (MINLP) under time and quality of service constraints. To efficiently solve the resulting optimization problem, we investigate the cooperative Multi-Agent Reinforcement Learning (MARL) framework and propose an RL approach based on the popular on-policy Reinforcement Learning (RL) algorithm: Policy Proximal Optimization (PPO). Our approach leverages the centralized training decentralized execution (CTDE) framework where the UAVs learn their optimal policies while training a centralized value function. Our simulation results show that the proposed MAPPO approach reduces the global AoU by at least a factor of 1/2 compared to conventional off-policy reinforcement learning approaches.
翻译:无人机被视为在无线通信网络中执行广泛任务的一种有前景的技术。本文考虑了部署一组无人机来收集物联网设备生成的数据,特别关注所收集数据具有时间敏感性且保持其时效性至关重要的情况。我们的目标是优化设计无人机的轨迹和所访问的物联网设备子集,以最小化全局更新年限(AoU)。为此,我们将在时间和服务质量约束下,将所研究的问题建模为一个混合整数非线性规划(MINLP)。为了高效求解该优化问题,我们研究了协作式多智能体强化学习(MARL)框架,并提出了一种基于流行的同策略强化学习(RL)算法——策略近端优化(PPO)的强化学习方法。我们的方法利用了集中式训练分散式执行(CTDE)框架,其中无人机在训练集中式价值函数的同时学习其最优策略。仿真结果表明,与传统的离策略强化学习方法相比,所提出的MAPPO方法将全局AoU降低了至少一半。