Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs movements and AoI is formulated as a mean field game (MFG). The MFG optimization yields an expansive solution space encompassing continuous state and action, resulting in significant computational complexity. To address practical situations, we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV's trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. Furthermore, a long short term memory (LSTM) is leveraged in MF-HPPO to predict the time-varying network state and stabilize the training. Numerical results demonstrate that the proposed MF-HPPO reduces the average AoI by up to 45 percent and 57 percent in the considered simulation setting, as compared to multi-agent deep Q-learning (MADQN) method and non-learning random algorithm, respectively.
翻译:无人机群在远程及恶劣环境中对地面传感器的及时数据采集发挥着重要作用。优化无人机群的协同行为可提升数据采集性能。本文提出一种新的平均场飞行资源分配优化方法,以最小化传感数据的年龄(AoI),其中无人机移动与信息年龄之间的权衡被建模为平均场博弈(MFG)。平均场博弈优化产生了包含连续状态与动作的庞大解空间,导致显著的计算复杂度。为解决实际场景问题,我们提出一种新的平均场混合近端策略优化(MF-HPPO)方案,通过优化无人机轨迹及地面传感器的数据采集调度(混合连续与离散动作)来最小化平均AoI。此外,在MF-HPPO中引入长短期记忆网络(LSTM)以预测时变网络状态并稳定训练过程。数值结果表明,与多智能体深度Q学习(MADQN)方法及非学习随机算法相比,所提出的MF-HPPO在模拟设定下可将平均AoI分别降低高达45%和57%。