Federated Learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL approach named CaBaFL, which includes a hierarchical Cache-based aggregation mechanism and a feature Balance-guided device selection strategy. CaBaFL maintains multiple intermediate models simultaneously for local training. The hierarchical cache-based aggregation mechanism enables each intermediate model to be trained on multiple devices to align the training time and mitigate the straggler issue. In specific, each intermediate model is stored in a low-level cache for local training and when it is trained by sufficient local devices, it will be stored in a high-level cache for aggregation. To address the problem of imbalanced data, the feature balance-guided device selection strategy in CaBaFL adopts the activation distribution as a metric, which enables each intermediate model to be trained across devices with totally balanced data distributions before aggregation. Experimental results show that compared with the state-of-the-art FL methods, CaBaFL achieves up to 9.26X training acceleration and 19.71\% accuracy improvements.
翻译:摘要:联邦学习作为一种有前景的分布式机器学习范式,已在人工智能物联网(AIoT)应用中得到广泛采用。然而,由于海量AIoT设备中存在掉队者(stragglers)和数据不平衡问题,联邦学习的效率和推理能力受到严重制约。为解决上述挑战,本文提出一种名为CaBaFL的新型异步联邦学习方法,该方法包含基于分层缓存的聚合机制和基于特征平衡的设备选择策略。CaBaFL同时维护多个中间模型用于本地训练。基于分层缓存的聚合机制使得每个中间模型可在多个设备上训练,从而对齐训练时间并缓解掉队者问题。具体而言,每个中间模型首先存储在低级缓存中用于本地训练,当其被足够多的本地设备训练后,将被转移至高级缓存进行聚合。针对数据不平衡问题,CaBaFL中的特征平衡引导设备选择策略采用激活分布作为度量指标,使得每个中间模型在聚合前能够跨完全平衡数据分布的设备进行训练。实验结果表明,与最先进的联邦学习方法相比,CaBaFL实现了最高9.26倍的训练加速和19.71%的准确率提升。