In this paper, we examine cloud-edge-terminal IoT networks, where edges undertake a range of typical dynamic scheduling tasks. In these IoT networks, a central policy for each task can be constructed at a cloud server. The central policy can be then used by the edges conducting the task, thereby mitigating the need for them to learn their own policy from scratch. Furthermore, this central policy can be collaboratively learned at the cloud server by aggregating local experiences from the edges, thanks to the hierarchical architecture of the IoT networks. To this end, we propose a novel collaborative policy learning framework for dynamic scheduling tasks using federated reinforcement learning. For effective learning, our framework adaptively selects the tasks for collaborative learning in each round, taking into account the need for fairness among tasks. In addition, as a key enabler of the framework, we propose an edge-agnostic policy structure that enables the aggregation of local policies from different edges. We then provide the convergence analysis of the framework. Through simulations, we demonstrate that our proposed framework significantly outperforms the approaches without collaborative policy learning. Notably, it accelerates the learning speed of the policies and allows newly arrived edges to adapt to their tasks more easily.
翻译:本文研究了云-边-端物联网网络,其中边缘节点执行一系列典型的动态调度任务。在该类物联网网络中,可在云服务器上为每个任务构建全局策略,随后由执行该任务的边缘节点使用,从而避免其从零开始学习各自的策略。此外,得益于物联网网络的层次化架构,全局策略可通过聚合边缘节点的本地经验在云服务器上协同学习。为此,我们提出了一种基于联邦强化学习的动态调度任务协同策略学习框架。为实现高效学习,该框架在每轮迭代中自适应选择需要协同学习的任务,并兼顾任务间的公平性需求。作为框架的核心支撑,我们进一步提出了一种边缘无关的策略结构,使不同边缘节点的本地策略能够被有效聚合。随后给出了该框架的收敛性分析。仿真结果表明,所提框架显著优于无协同策略学习的方法,尤其能够加速策略的学习速度,并使新接入的边缘节点更易适应其任务场景。