The substantial computational demands of modern large-scale deep learning present significant challenges for efficient training and deployment. Recent research has revealed a widespread phenomenon wherein deep networks inherently learn low-rank structures in their weights and representations during training. This tutorial paper provides a comprehensive review of advances in identifying and exploiting these low-rank structures, bridging mathematical foundations with practical applications. We present two complementary theoretical perspectives on the emergence of low-rankness: viewing it through the optimization dynamics of gradient descent throughout training, and understanding it as a result of implicit regularization effects at convergence. Practically, these theoretical perspectives provide a foundation for understanding the success of techniques such as Low-Rank Adaptation (LoRA) in fine-tuning, inspire new parameter-efficient low-rank training strategies, and explain the effectiveness of masked training approaches like dropout and masked self-supervised learning.
翻译:现代大规模深度学习模型的计算需求巨大,这对高效训练与部署提出了严峻挑战。近期研究表明,深度网络在训练过程中会自发地在权重与表征中学习到低秩结构,这一现象普遍存在。本综述论文系统回顾了识别与利用这些低秩结构的研究进展,将数学基础与实际应用相结合。我们从两个互补的理论视角探讨低秩结构的形成机制:一是通过梯度下降在训练全过程中的优化动态进行观察,二是将其理解为收敛时隐式正则化效应的结果。在实践层面,这些理论视角为理解低秩适配(LoRA)等微调技术的成功提供了基础,启发了新的参数高效低秩训练策略,并解释了如dropout和掩码自监督学习等掩码训练方法的有效性。