Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Deep Learning (DL) frameworks such as PyTorch and TensorFlow include runtime infrastructures responsible for executing trained models on target hardware, managing memory, data transfers, and multi-accelerator execution, if applicable. Additionally, it is a common practice to deploy pre-trained models on environments distinct from their native development settings. This led to the introduction of interchange formats such as ONNX, which includes its runtime infrastructure, and ONNX Runtime, which work as standard formats that can be used across diverse DL frameworks and languages. Even though these runtime infrastructures have a great impact on inference performance, no previous paper has investigated their energy efficiency. In this study, we monitor the energy consumption and inference time in the runtime infrastructures of three well-known DL frameworks as well as ONNX, using three various DL models. To have nuance in our investigation, we also examine the impact of using different execution providers. We find out that the performance and energy efficiency of DL are difficult to predict. One framework, MXNet, outperforms both PyTorch and TensorFlow for the computer vision models using batch size 1, due to efficient GPU usage and thus low CPU usage. However, batch size 64 makes PyTorch and MXNet practically indistinguishable, while TensorFlow is outperformed consistently. For BERT, PyTorch exhibits the best performance. Converting the models to ONNX usually yields significant performance improvements but the ONNX converted ResNet model with batch size 64 consumes approximately 10% more energy and time than the original PyTorch model.

翻译：深度学习框架（如PyTorch和TensorFlow）包含负责在目标硬件上执行训练模型、管理内存、数据传输以及多加速器执行的运行时基础设施。此外，将预训练模型部署到与其原生开发环境不同的环境中已成为常见实践。这促使了ONNX等交换格式（及其运行时基础设施ONNX Runtime）的引入，这些格式作为标准格式可被多种深度学习框架和语言使用。尽管这些运行时基础设施对推理性能有重大影响，但此前尚无研究探讨其能效。本研究使用三种不同的深度学习模型，监测了三种主流深度学习框架及ONNX的运行时基础设施的能耗和推理时间。为深入探究，我们还分析了不同执行提供程序的影响。研究发现，深度学习的性能和能效难以预测。对于使用批次大小为1的计算机视觉模型，MXNet框架凭借高效的GPU使用率及极低的CPU负载，表现优于PyTorch和TensorFlow；但当批次大小为64时，PyTorch与MXNet性能几乎无差异，而TensorFlow则持续落后。对于BERT模型，PyTorch展现出最佳性能。将模型转换为ONNX通常能显著提升性能，但批次大小为64的ONNX转换版ResNet模型比原始PyTorch模型多消耗约10%的能耗和时间。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日