In this work, we study to release the potential of massive heterogeneous weak computing power to collaboratively train large-scale models on dispersed datasets. In order to improve both efficiency and accuracy in resource-adaptive collaborative learning, we take the first step to consider the \textit{unstructured pruning}, \textit{varying submodel architectures}, \textit{knowledge loss}, and \textit{straggler} challenges simultaneously. We propose a novel semi-asynchronous collaborative training framework, namely ${Co\text{-}S}^2{P}$, with data distribution-aware structured pruning and cross-block knowledge transfer mechanism to address the above concerns. Furthermore, we provide theoretical proof that ${Co\text{-}S}^2{P}$ can achieve asymptotic optimal convergence rate of $O(1/\sqrt{N^*EQ})$. Finally, we conduct extensive experiments on two types of tasks with a real-world hardware testbed including diverse IoT devices.The experimental results demonstrate that $Co\text{-}S^2P$ improves accuracy by up to 8.8\% and resource utilization by up to 1.2$\times$ compared to state-of-the-art methods, while reducing memory consumption by approximately 22\% and training time by about 24\% on all resource-limited devices.
翻译:在本工作中,我们研究如何释放海量异构弱计算能力的潜力,以在分散的数据集上协同训练大规模模型。为了提高资源自适应协同学习的效率和精度,我们首次同时考虑了非结构化剪枝、变化的子模型架构、知识丢失以及掉队者等挑战。我们提出了一种新颖的半异步协同训练框架,即 ${Co\text{-}S}^2{P}$,它通过数据分布感知的结构化剪枝和跨块知识转移机制来解决上述问题。此外,我们提供了理论证明,表明 ${Co\text{-}S}^2{P}$ 能够达到 $O(1/\sqrt{N^*EQ})$ 的渐近最优收敛速率。最后,我们在包含多样化物联网设备的真实硬件测试平台上,对两类任务进行了广泛的实验。实验结果表明,与最先进的方法相比,$Co\text{-}S^2P$ 将精度提升了高达 8.8\%,资源利用率提升了高达 1.2$\times$,同时在所有资源受限设备上减少了约 22\% 的内存消耗和约 24\% 的训练时间。