The expansion of IoT devices and the demands of Deep Learning have highlighted significant challenges in Distributed Deep Learning (DDL) systems. Parallel Split Learning (PSL) has emerged as a promising derivative of Split Learning that is well suited for distributed learning on resource-constrained devices. However, PSL faces several obstacles, such as large effective batch sizes, non-IID data distributions, and the straggler effect. We view these issues as a sampling dilemma and propose to address them by orchestrating the mini-batch sampling process on the server side. We introduce the Uniform Global Sampling (UGS) method to decouple the effective batch size from the number of clients and reduce mini-batch deviation in non-IID settings. To address the straggler effect, we introduce the Latent Dirichlet Sampling (LDS) method, which generalizes UGS to balance the trade-off between batch deviation and training time. Our simulations reveal that our proposed methods enhance model accuracy by up to 34.1% in non-IID settings and reduce the training time in the presence of stragglers by up to 62%. In particular, LDS effectively mitigates the straggler effect without compromising model accuracy or adding significant computational overhead compared to UGS. Our results demonstrate the potential of our methods as a promising solution for DDL in real applications.
翻译:物联网设备的扩展和深度学习的需求凸显了分布式深度学习系统面临的重大挑战。并行分割学习作为分割学习的一种有前景的衍生形式,非常适合在资源受限的设备上进行分布式学习。然而,并行分割学习面临若干障碍,例如较大的有效批处理规模、非独立同分布数据分布以及掉队者效应。我们将这些问题视为采样困境,并提出通过在服务器端协调小批量采样过程来解决这些问题。我们引入了均匀全局采样方法,以将有效批处理规模与客户端数量解耦,并减少非独立同分布设置中的小批量偏差。针对掉队者效应,我们引入了潜在狄利克雷采样方法,该方法推广了均匀全局采样,以平衡批处理偏差与训练时间之间的权衡。我们的模拟结果表明,所提出的方法在非独立同分布设置中将模型精度提高了高达34.1%,并在存在掉队者的情况下将训练时间减少了高达62%。特别地,与均匀全局采样相比,潜在狄利克雷采样在不影响模型精度或增加显著计算开销的情况下,有效缓解了掉队者效应。我们的结果证明了这些方法在实际应用中作为分布式深度学习解决方案的潜力。