This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy consumption and carbon emission. Therefore, the goal of this work is to raise awareness about the energy impact of general training parameters and processes, from learning rate over batch size to knowledge transfer. Multiple setups with different hyperparameter configurations are evaluated on three different hardware systems. Among many results, we have found out that even with the same model and hardware to reach the same accuracy, improperly set training hyperparameters consume up to 5 times the energy of the optimal setup. We also extensively examined the energy-saving benefits of learning paradigms including recycling knowledge through pretraining and sharing knowledge through multitask training.
翻译:本工作从生命周期感知的角度,对机器学习训练机制和学习范式的变化对计算(尤其是高性能计算硬件)能耗的影响进行了启发式评估。虽然数据可用性的增加和高性能硬件的创新推动了复杂模型的训练,但也助长了人们对能源消耗和碳排放的漠视。因此,本工作旨在提高对通用训练参数和流程(从学习率、批量大小到知识迁移)能耗影响的认识。我们在三种不同的硬件系统上评估了多种具有不同超参数配置的设置。在诸多结果中,我们发现:即使使用相同的模型和硬件达到相同的精度,训练超参数设置不当所消耗的能源最高可达最优设置的5倍。我们还深入研究了学习范式(包括通过预训练回收知识和通过多任务训练共享知识)的节能效益。