The substantial computational demands of modern large-scale deep learning present significant challenges for efficient training and deployment. Recent research has revealed a widespread phenomenon wherein deep networks inherently learn low-rank structures in their weights and representations during training. This tutorial paper provides a comprehensive review of advances in exploiting these low-rank structures, bridging mathematical foundations with practical applications. We present two complementary theoretical perspectives on the emergence of low-rankness: viewing it through the optimization dynamics of gradient descent throughout training, and understanding it as a result of implicit regularization effects at convergence. Practically, these theoretical frameworks provide a foundation for understanding the success of techniques such as Low-Rank Adaptation (LoRA) in fine-tuning, inspire new parameter-efficient low-rank training strategies, and explain the effectiveness of masked training approaches like dropout and masked self-supervised learning.
翻译:现代大规模深度学习的巨大计算需求对高效训练与部署提出了严峻挑战。近期研究揭示了一个普遍现象:深度网络在训练过程中会固有地学习权重与表征的低秩结构。本综述论文系统回顾了利用这些低秩结构的研究进展,将数学基础与实际应用相衔接。我们提出了关于低秩性涌现的两个互补理论视角:一是通过整个训练过程中梯度下降的优化动力学来观察,二是将其理解为收敛时隐式正则化效应的结果。在实践中,这些理论框架为理解微调中低秩适配(LoRA)等技术的成功提供了基础,启发了新的参数高效低秩训练策略,并解释了如dropout和掩码自监督学习等掩码训练方法的有效性。