异步分数多智能体深度强化学习在年龄最小化移动边缘计算中的应用 (Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing)

In the realm of emerging real-time networked applications such as cyber-physical systems (CPS), the Age of Information (AoI) has emerged as a pivotal metric for evaluating timeliness. To meet the high computational demands, such as those in smart manufacturing within CPS, mobile edge computing (MEC) presents a promising solution for optimizing computing and reducing AoI. In this work, we study the timeliness of compute-intensive updates and explore jointly optimizing the task updating (when to generate a task) and offloading (where to process a task) policies to minimize AoI. Specifically, we consider edge load dynamics and formulate a task scheduling problem to minimize the expected time-average AoI. Solving this problem is challenging due to the fractional objective introduced by AoI and the asynchronous decision-making of the semi-Markov game (SMG). To this end, we propose a fractional reinforcement learning (RL) framework. We begin by introducing a fractional single-agent RL framework and establish its linear convergence rate. Building on this, we develop a fractional multi-agent RL framework, extend Dinkelbach's method, and demonstrate its equivalence to the inexact Newton's method. Furthermore, we provide the conditions under which the framework achieves linear convergence to the Nash equilibrium (NE). To tackle the challenge of asynchronous decision-making in the SMG, we further design an asynchronous model-free fractional multi-agent RL algorithm, where each mobile device can determine the task updating and offloading decisions without knowing the real-time system dynamics and decisions of other devices. Experimental results show that when compared with the best existing baseline algorithm, our proposed algorithm reduces the average AoI by up to 50.6%.

翻译：在网络物理系统（CPS）等新兴实时网络应用领域，信息年龄（AoI）已成为评估时效性的关键指标。为满足CPS中智能制造等高计算需求，移动边缘计算（MEC）为优化计算和降低AoI提供了一种前景广阔的解决方案。本研究聚焦计算密集型更新的时效性问题，联合优化任务更新（何时生成任务）与卸载（在何处处理任务）策略以最小化AoI。具体而言，我们考虑边缘负载动态特性，构建了最小化期望时间平均AoI的任务调度问题。该问题求解面临两大挑战：AoI引入的分数型目标函数，以及半马尔可夫博弈（SMG）中的异步决策机制。为此，我们提出一种分数强化学习（RL）框架。首先建立分数单智能体RL框架并证明其线性收敛速率，在此基础上发展分数多智能体RL框架，扩展丁克尔巴赫方法，并论证其与不精确牛顿法的等价性。进一步地，我们给出该框架实现纳什均衡（NE）线性收敛的充分条件。为应对SMG中异步决策的挑战，我们设计了一种异步无模型分数多智能体RL算法，各移动设备可在未知实时系统动态与其他设备决策的情况下，自主确定任务更新与卸载策略。实验结果表明，相较于现有最优基线算法，本算法可将平均AoI降低达50.6%。