We consider an asynchronous hierarchical federated learning (AHFL) setting with a client-edge-cloud framework. The clients exchange the trained parameters with their corresponding edge servers, which update the locally aggregated model. This model is then transmitted to all the clients in the local cluster. The edge servers communicate to the central cloud server for global model aggregation. The goal of each client is to converge to the global model, while maintaining timeliness of the clients, i.e., having optimum training iteration time. We investigate the convergence criteria for such a system with dense clusters. Our analysis shows that for a system of $n$ clients with fixed average timeliness, the convergence in finite time is probabilistically guaranteed, if the nodes are divided into $O(1)$ number of clusters, that is, if the system is built as a sparse set of edge servers with dense client bases each.
翻译:我们考虑一种采用客户端-边缘-云端框架的异步分层联邦学习(AHFL)场景。客户端与对应的边缘服务器交换训练参数,边缘服务器负责更新本地聚合模型,随后将该模型传输至本地集群中的所有客户端。边缘服务器与中央云服务器通信以实现全局模型聚合。每个客户端的目标是在保持自身时效性(即最优训练迭代时间)的同时收敛至全局模型。我们研究了此类包含密集集群系统的收敛准则。分析表明:对于具有固定平均时效性的$n$个客户端系统,若节点被划分为$O(1)$个集群(即系统构建为稀疏边缘服务器且每个服务器配备密集客户端基数的结构),则有限时间内收敛的概率性保证成立。