Recent research has shown that training low-rank neural networks can effectively reduce the total number of trainable parameters without sacrificing predictive accuracy, resulting in end-to-end speedups. However, low-rank model training necessitates adjusting several additional factorization hyperparameters, such as the rank of the factorization at each layer. In this paper, we tackle this challenge by introducing Cuttlefish, an automated low-rank training approach that eliminates the need for tuning factorization hyperparameters. Cuttlefish leverages the observation that after a few epochs of full-rank training, the stable rank (i.e., an approximation of the true rank) of each layer stabilizes at a constant value. Cuttlefish switches from full-rank to low-rank training once the stable ranks of all layers have converged, setting the dimension of each factorization to its corresponding stable rank. Our results show that Cuttlefish generates models up to 5.6 times smaller than full-rank models, and attains up to a 1.2 times faster end-to-end training process while preserving comparable accuracy. Moreover, Cuttlefish outperforms state-of-the-art low-rank model training methods and other prominent baselines. The source code for our implementation can be found at: https://github.com/hwang595/Cuttlefish.
翻译:近期研究表明,训练低秩神经网络能在不牺牲预测精度的前提下有效减少可训练参数总量,从而带来端到端的加速效果。然而,低秩模型训练需额外调整多个因子分解超参数(如各层的分解秩)。本文通过提出Cuttlefish(一种自动化低秩训练方法)解决了这一挑战,该方法彻底消除了对因子分解超参数调优的需求。Cuttlefish基于以下观察:经过数轮全秩训练后,各层稳定秩(即真实秩的近似值)会收敛至恒定值。当所有层的稳定秩均收敛后,Cuttlefish从全秩训练切换至低秩训练,并将各层因子分解维度设为其对应的稳定秩。实验结果表明,相较于全秩模型,Cuttlefish生成模型体积可缩小高达5.6倍,端到端训练过程加速最高达1.2倍,同时保持可比的精度。此外,Cuttlefish的性能优于当前最先进的低秩模型训练方法及其他主流基线方法。实现源码详见:https://github.com/hwang595/Cuttlefish。