With the continuous increase of IoT applications, their effective scheduling in edge and cloud computing has become a critical challenge. The inherent dynamism and stochastic characteristics of edge and cloud computing, along with IoT applications, necessitate solutions that are highly adaptive. Currently, several centralized Deep Reinforcement Learning (DRL) techniques are adapted to address the scheduling problem. However, they require a large amount of experience and training time to reach a suitable solution. Moreover, many IoT applications contain multiple interdependent tasks, imposing additional constraints on the scheduling problem. To overcome these challenges, we propose a Transformer-enhanced Distributed DRL scheduling technique, called TF-DDRL, to adaptively schedule heterogeneous IoT applications. This technique follows the Actor-Critic architecture, scales efficiently to multiple distributed servers, and employs an off-policy correction method to stabilize the training process. In addition, Prioritized Experience Replay (PER) and Transformer techniques are introduced to reduce exploration costs and capture long-term dependencies for faster convergence. Extensive results of practical experiments show that TF-DDRL, compared to its counterparts, significantly reduces response time, energy consumption, monetary cost, and weighted cost by up to 60%, 51%, 56%, and 58%, respectively.
翻译:随着物联网应用的持续增长,其在边缘与云计算环境中的高效调度已成为一项关键挑战。边缘与云计算环境以及物联网应用固有的动态性和随机特性,要求解决方案具备高度自适应性。目前,已有多种集中式深度强化学习技术被应用于解决调度问题。然而,这些方法需要大量的经验数据与训练时间才能获得合适的解决方案。此外,许多物联网应用包含多个相互依赖的任务,这给调度问题带来了额外的约束。为克服这些挑战,我们提出了一种名为TF-DDRL的Transformer增强型分布式深度强化学习调度技术,用于自适应地调度异构物联网应用。该技术遵循Actor-Critic架构,可高效扩展至多个分布式服务器,并采用离策略校正方法以稳定训练过程。此外,通过引入优先级经验回放与Transformer技术,降低了探索成本并捕获长期依赖关系,从而加速收敛。大量实际实验结果表明,与现有技术相比,TF-DDRL在响应时间、能耗、经济成本及加权成本方面分别最高可降低60%、51%、56%和58%。