As data volumes expand rapidly, distributed machine learning has become essential for addressing the growing computational demands of modern AI systems. However, training models in distributed environments is challenging with participants hold skew, Non-Independent-Identically distributed (Non-IID) data. Low-Rank Adaptation (LoRA) offers a promising solution to this problem by personalizing low-rank updates rather than optimizing the entire model, LoRA-enabled distributed learning minimizes computational and maximize personalization for each participant. Enabling more robust and efficient training in distributed learning settings, especially in large-scale, heterogeneous systems. Despite the strengths of current state-of-the-art methods, they often require manual configuration of the initial rank, which is increasingly impractical as the number of participants grows. This manual tuning is not only time-consuming but also prone to suboptimal configurations. To address this limitation, we propose AutoRank, an adaptive rank-setting algorithm inspired by the bias-variance trade-off. AutoRank leverages the MCDA method TOPSIS to dynamically assign local ranks based on the complexity of each participant's data. By evaluating data distribution and complexity through our proposed data complexity metrics, AutoRank provides fine-grained adjustments to the rank of each participant's local LoRA model. This adaptive approach effectively mitigates the challenges of double-imbalanced, non-IID data. Experimental results demonstrate that AutoRank significantly reduces computational overhead, enhances model performance, and accelerates convergence in highly heterogeneous federated learning environments. Through its strong adaptability, AutoRank offers a scalable and flexible solution for distributed machine learning.
翻译:随着数据量的快速增长,分布式机器学习已成为应对现代人工智能系统日益增长的计算需求的关键技术。然而,在参与者持有偏斜、非独立同分布数据的分布式环境中训练模型具有挑战性。低秩适应(LoRA)通过个性化低秩更新而非优化整个模型,为这一问题提供了有前景的解决方案;基于LoRA的分布式学习能够最小化计算开销并最大化每个参与者的个性化程度,从而在分布式学习环境中实现更鲁棒高效的训练,尤其适用于大规模异构系统。尽管当前最先进方法具有优势,但它们通常需要手动配置初始秩值,这在参与者数量增加时变得日益不切实际。这种手动调优不仅耗时,还容易导致次优配置。为克服这一局限,我们提出AutoRank——一种受偏差-方差权衡启发的自适应秩值设定算法。AutoRank利用多准则决策分析方法TOPSIS,根据每个参与者数据的复杂性动态分配局部秩值。通过我们提出的数据复杂性指标评估数据分布与复杂度,AutoRank能够对每个参与者的本地LoRA模型秩值进行细粒度调整。这种自适应方法有效缓解了双重不平衡、非独立同分布数据带来的挑战。实验结果表明,在高度异构的联邦学习环境中,AutoRank显著降低了计算开销,提升了模型性能,并加速了收敛过程。凭借其强大的适应性,AutoRank为分布式机器学习提供了可扩展且灵活的解决方案。