While providing machine learning model as a service to process users' inference requests, online applications can periodically upgrade the model utilizing newly collected data. Federated learning (FL) is beneficial for enabling the training of models across distributed clients while keeping the data locally. However, existing work has overlooked the coexistence of model training and inference under clients' limited resources. This paper focuses on the joint optimization of model training and inference to maximize inference performance at clients. Such an optimization faces several challenges. The first challenge is to characterize the clients' inference performance when clients may partially participate in FL. To resolve this challenge, we introduce a new notion of age of model (AoM) to quantify client-side model freshness, based on which we use FL's global model convergence error as an approximate measure of inference performance. The second challenge is the tight coupling among clients' decisions, including participation probability in FL, model download probability, and service rates. Toward the challenges, we propose an online problem approximation to reduce the problem complexity and optimize the resources to balance the needs of model training and inference. Experimental results demonstrate that the proposed algorithm improves the average inference accuracy by up to 12%.
翻译:在提供机器学习模型即服务以处理用户推理请求的同时,在线应用可以定期利用新收集的数据升级模型。联邦学习(FL)能够在分布式客户端上训练模型,同时保持数据本地化,具有显著优势。然而,现有研究忽视了在客户端有限资源下模型训练与推理的共存问题。本文聚焦于模型训练与推理的联合优化,以最大化客户端的推理性能。此类优化面临多重挑战。首要挑战在于当客户端可能部分参与FL时如何刻画其推理性能。为解决此问题,我们引入模型年龄(AoM)新概念来量化客户端侧模型新鲜度,并基于此利用FL全局模型收敛误差作为推理性能的近似度量。第二项挑战涉及客户端决策间的紧密耦合,包括FL参与概率、模型下载概率及服务速率。针对这些挑战,我们提出在线问题近似方法以降低问题复杂度,并优化资源以平衡模型训练与推理的需求。实验结果表明,所提算法可将平均推理精度提升高达12%。