面向年龄最小化移动边缘计算的异步分数多智能体深度强化学习 (Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing)

In the realm of emerging real-time networked applications like cyber-physical systems (CPS), the Age of Information (AoI) has merged as a pivotal metric for evaluating the timeliness. To meet the high computational demands, such as those in intelligent manufacturing within CPS, mobile edge computing (MEC) presents a promising solution for optimizing computing and reducing AoI. In this work, we study the timeliness of computational-intensive updates and explores jointly optimize the task updating and offloading policies to minimize AoI. Specifically, we consider edge load dynamics and formulate a task scheduling problem to minimize the expected time-average AoI. The fractional objective introduced by AoI and the semi-Markov game nature of the problem render this challenge particularly difficult, with existing approaches not directly applicable. To this end, we present a comprehensive framework to fractional reinforcement learning (RL). We first introduce a fractional single-agent RL framework and prove its linear convergence. We then extend this to a fractional multi-agent RL framework with a convergence analysis. To tackle the challenge of asynchronous control in semi-Markov game, we further design an asynchronous model-free fractional multi-agent RL algorithm, where each device makes scheduling decisions with the hybrid action space without knowing the system dynamics and decisions of other devices. Experimental results show that our proposed algorithms reduce the average AoI by up to 52.6% compared with the best baseline algorithm in our experiments.

翻译：在新兴实时网络应用领域，如信息物理系统（CPS），信息年龄（AoI）已成为评估信息时效性的关键指标。为满足诸如CPS中智能制造等高计算需求，移动边缘计算（MEC）为优化计算和降低AoI提供了一种有前景的解决方案。本文研究了计算密集型更新的时效性问题，并探索联合优化任务更新与卸载策略以最小化AoI。具体而言，我们考虑边缘负载动态，构建了一个任务调度问题以最小化期望时间平均AoI。由AoI引入的分数目标及问题的半马尔可夫博弈特性使得该挑战尤为困难，现有方法无法直接适用。为此，我们提出了一个全面的分数强化学习（RL）框架。首先引入分数单智能体RL框架并证明其线性收敛性，随后将其扩展至分数多智能体RL框架并给出收敛性分析。为应对半马尔可夫博弈中的异步控制挑战，我们进一步设计了一种异步无模型分数多智能体RL算法，其中每个设备在未知系统动态与其他设备决策的情况下，通过混合动作空间做出调度决策。实验结果表明，与我们实验中的最佳基线算法相比，所提算法可将平均AoI降低高达52.6%。