With the rapid development of Mobile Edge Computing (MEC), various real-time applications have been deployed to benefit people's daily lives. The performance of these applications relies heavily on the freshness of collected environmental information, which can be quantified by its Age of Information (AoI). In the traditional definition of AoI, it is assumed that the status information can be actively sampled and directly used. However, for many MEC-enabled applications, the desired status information is updated in an event-driven manner and necessitates data processing. To better serve these applications, we propose a new definition of AoI and, based on the redefined AoI, we formulate an online AoI minimization problem for MEC systems. Notably, the problem can be interpreted as a Markov Decision Process (MDP), thus enabling its solution through Reinforcement Learning (RL) algorithms. Nevertheless, the traditional RL algorithms are designed for MDPs with completely unknown system dynamics and hence usually suffer long convergence times. To accelerate the learning process, we introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness. Numerical results demonstrate that our algorithm outperforms the benchmarks under various scenarios.
翻译:随着移动边缘计算(MEC)的快速发展,各类实时应用已广泛部署以惠及人们的日常生活。这些应用的性能高度依赖于所收集环境信息的时效性,该时效性可通过信息年龄(AoI)进行量化。在传统的AoI定义中,通常假设状态信息能够被主动采样并直接使用。然而,对于许多基于MEC的应用而言,所需的状态信息是以事件驱动方式更新的,并且需要进行数据处理。为了更好地服务这些应用,我们提出了一种新的AoI定义,并基于重新定义的AoI,为MEC系统建立了一个在线AoI最小化问题。值得注意的是,该问题可被解释为马尔可夫决策过程(MDP),从而能够通过强化学习(RL)算法求解。然而,传统RL算法是针对系统动态特性完全未知的MDP设计的,因此通常需要较长的收敛时间。为加速学习过程,我们引入了决策后状态(PDS)来利用系统动态特性的部分已知信息。同时,我们将PDS与深度强化学习相结合,以进一步提升算法的适用性、可扩展性和鲁棒性。数值结果表明,我们的算法在各种场景下均优于基准方法。