Hierarchical federated learning (HFL) designs introduce intermediate aggregator nodes between clients and the global federated learning server in order to reduce communication costs and distribute server load. One side effect is that machine learning model replication at scale comes "for free" as part of the HFL process: model replicas are hosted at the client end, intermediate nodes, and the global server level and are readily available for serving inference requests. This creates opportunities for efficient model serving but simultaneously couples the training and serving processes and calls for their joint orchestration. This is particularly important for continual learning, where serving a model while (re)training it periodically, upon specific triggers, or continuously, takes place over shared infrastructure spanning the computing continuum. Consequently, training and inference workloads can interfere with detrimental effects on performance. To address this issue, we propose an inference load-aware HFL orchestration scheme, which makes informed decisions on HFL configuration, considering knowledge about inference workloads and the respective processing capacity. Applying our scheme to a continual learning use case in the transportation domain, we demonstrate that by optimizing aggregator node placement and device-aggregator association, significant inference latency savings can be achieved while communication costs are drastically reduced compared to flat centralized federated learning.
翻译:分层联邦学习(HFL)架构通过在客户端与全局联邦学习服务器之间引入中间聚合节点,以降低通信成本并分散服务器负载。其附带效应是,机器学习模型的大规模复制在HFL过程中“免费”实现:模型副本部署于客户端、中间节点及全局服务器层级,可随时用于处理推理请求。这为高效模型服务创造了机遇,但同时也耦合了训练与服务过程,并需要二者的协同编排。这对于持续学习尤为重要,因为在跨越计算连续体的共享基础设施上,模型在(重新)训练期间(无论是定期触发、特定条件触发还是持续进行)仍需同时提供服务。因此,训练与推理工作负载可能相互干扰,对性能产生不利影响。为解决此问题,我们提出一种推理负载感知的HFL编排方案,该方案基于对推理工作负载及其相应处理能力的认知,对HFL配置做出优化决策。通过在交通领域的持续学习用例中应用本方案,我们证明:通过优化聚合节点部署与设备-聚合器关联策略,在通信成本相比扁平化集中式联邦学习大幅降低的同时,能够实现显著的推理延迟节省。