Meeting the strict Quality of Service (QoS) requirements of terminals has imposed a signiffcant challenge on Multiaccess Edge Computing (MEC) systems, due to the limited multidimensional resources. To address this challenge, we propose a collaborative MEC framework that facilitates resource sharing between the edge servers, and with the aim to maximize the long-term QoS and reduce the cache switching cost through joint optimization of service caching, collaborative offfoading, and computation and communication resource allocation. The dual timescale feature and temporal recurrence relationship between service caching and other resource allocation make solving the problem even more challenging. To solve it, we propose a deep reinforcement learning (DRL)-based dual timescale scheme, called DGL-DDPG, which is composed of a short-term genetic algorithm (GA) and a long short-term memory network-based deep deterministic policy gradient (LSTM-DDPG). In doing so, we reformulate the optimization problem as a Markov decision process (MDP) where the small-timescale resource allocation decisions generated by an improved GA are taken as the states and input into a centralized LSTM-DDPG agent to generate the service caching decision for the large-timescale. Simulation results demonstrate that our proposed algorithm outperforms the baseline algorithms in terms of the average QoS and cache switching cost.
翻译:满足终端的严格服务质量(QoS)要求给多接入边缘计算(MEC)系统带来了重大挑战,原因是其有限的多维资源。为应对这一挑战,我们提出了一种协作式MEC框架,通过促进边缘服务器之间的资源共享,并旨在通过联合优化服务缓存、协作卸载以及计算和通信资源分配,最大化长期QoS并降低缓存切换成本。服务缓存与其他资源分配之间的双时间尺度特征及时间递归关系使得问题求解更具挑战性。为解决此问题,我们提出了一种基于深度强化学习(DRL)的双时间尺度方案,称为DGL-DDPG,该方案由短时间尺度遗传算法(GA)和基于长短期记忆网络的深度确定性策略梯度(LSTM-DDPG)组成。为此,我们将优化问题重新表述为马尔可夫决策过程(MDP),其中将改进GA生成的短时间尺度资源分配决策作为状态,输入到集中式LSTM-DDPG智能体中,以生成长时间尺度的服务缓存决策。仿真结果表明,我们提出的算法在平均QoS和缓存切换成本方面优于基线算法。