This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy consumption and carbon emission. Therefore, the goal of this work is to raise awareness about the energy impact of general training parameters and processes, from learning rate over batch size to knowledge transfer. Multiple setups with different hyperparameter configurations are evaluated on three different hardware systems. Among many results, we have found out that even with the same model and hardware to reach the same accuracy, improperly set training hyperparameters consume up to 5 times the energy of the optimal setup. We also extensively examined the energy-saving benefits of learning paradigms including recycling knowledge through pretraining and sharing knowledge through multitask training.
翻译:本研究从全生命周期视角出发,对机器学习训练机制与学习范式的变化对计算能耗(特别是高性能计算硬件)的影响进行了启发式评估。尽管数据可用性提升与高性能硬件创新推动了复杂模型的训练,但同时也导致了能耗与碳排放认知的逐渐淡化。因此,本研究旨在提高对通用训练参数与流程(从学习率、批大小到知识迁移)能耗影响的认识。我们在三套不同硬件系统上评估了多种不同超参数配置的训练方案。研究结果表明,即使采用相同的模型与硬件达到同等精度,不合理的训练超参数设置所消耗的能量可达最优配置的5倍。我们还深入探究了学习范式的节能优势,包括通过预训练实现知识循环利用以及通过多任务训练实现知识共享。