On-device training is essential for neural networks (NNs) to continuously adapt to new online data, but can be time-consuming due to the device's limited computing power. To speed up on-device training, existing schemes select trainable NN portion offline or conduct unrecoverable selection at runtime, but the evolution of trainable NN portion is constrained and cannot adapt to the current need for training. Instead, runtime adaptation of on-device training should be fully elastic, i.e., every NN substructure can be freely removed from or added to the trainable NN portion at any time in training. In this paper, we present ElasticTrainer, a new technique that enforces such elasticity to achieve the required training speedup with the minimum NN accuracy loss. Experiment results show that ElasticTrainer achieves up to 3.5x more training speedup in wall-clock time and reduces energy consumption by 2x-3x more compared to the existing schemes, without noticeable accuracy loss.
翻译:设备端训练对神经网络(NN)持续适应新在线数据至关重要,但由于设备计算能力有限,其执行过程可能非常耗时。现有方案通过在离线阶段预选可训练NN部分,或在运行时执行不可恢复的选择来加速设备端训练,但可训练NN部分的进化受到限制,无法适应当前训练需求。相反,设备端训练的运行时自适应应具有完全弹性,即NN的每个子结构可在训练过程中的任意时刻自由移出或加入可训练NN部分。本文提出ElasticTrainer新技术,通过强制实现这种弹性,在最小化NN精度损失的同时达成所需的训练加速。实验结果表明,与现有方案相比,ElasticTrainer在挂钟时间上实现了最高3.5倍的训练加速,并将能耗降低2至3倍,且未出现明显的精度损失。