Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data. However, training and deploying large-scale models on resource-constrained clients is challenging. Fortunately, Split Federated Learning (SFL) offers a feasible solution by alleviating the computation and/or communication burden on clients. However, existing SFL works often assume sufficient labeled data on clients, which is usually impractical. Besides, data non-IIDness across clients poses another challenge to ensure efficient model training. To our best knowledge, the above two issues have not been simultaneously addressed in SFL. Herein, we propose a novel Semi-SFL system, which incorporates clustering regularization to perform SFL under the more practical scenario with unlabeled and non-IID client data. Moreover, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data have an influence on the effectiveness of clustering regularization. To this end, we develop a control algorithm for dynamically adjusting the global updating frequency, so as to mitigate the training inconsistency and improve training performance. Extensive experiments on benchmark models and datasets show that our system provides a 3.0x speed-up in training time and reduces the communication cost by about 70.3% while reaching the target accuracy, and achieves up to 5.1% improvement in accuracy under non-IID scenarios compared to the state-of-the-art baselines.
翻译:联邦学习(FL)允许多个客户端在私有数据上协作训练机器学习模型。然而,在资源受限的客户端上训练和部署大规模模型仍面临挑战。幸运的是,分裂联邦学习(SFL)通过减轻客户端的计算和/或通信负担提供了可行方案。但现有SFL工作通常假设客户端拥有充足的标注数据,这在实际中往往不现实。此外,客户端间的数据非独立同分布性也增加了高效模型训练的难度。据我们所知,上述两个问题尚未在SFL中得到同步解决。为此,我们提出新型半监督SFL系统(Semi-SFL),通过引入聚类正则化,在更实际的未标注且非独立同分布的客户端数据场景下执行SFL。进一步地,针对模型收敛性的理论与实验研究表明,标注数据与未标注数据上不一致的训练过程会影响聚类正则化的有效性。为此,我们开发了动态调节全局更新频率的控制算法,以缓解训练不一致性并提升训练性能。在基准模型与数据集上的大量实验表明,在达到目标精度的前提下,本系统实现了3.0倍的训练速度提升,通信成本降低约70.3%;且在非独立同分布场景下,相比现有最优基线方法,准确率提升最高达5.1%。