Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Deep Learning (DL) frameworks such as PyTorch and TensorFlow include runtime infrastructures responsible for executing trained models on target hardware, managing memory, data transfers, and multi-accelerator execution, if applicable. Additionally, it is a common practice to deploy pre-trained models on environments distinct from their native development settings. This led to the introduction of interchange formats such as ONNX, which includes its runtime infrastructure, and ONNX Runtime, which work as standard formats that can be used across diverse DL frameworks and languages. Even though these runtime infrastructures have a great impact on inference performance, no previous paper has investigated their energy efficiency. In this study, we monitor the energy consumption and inference time in the runtime infrastructures of three well-known DL frameworks as well as ONNX, using three various DL models. To have nuance in our investigation, we also examine the impact of using different execution providers. We find out that the performance and energy efficiency of DL are difficult to predict. One framework, MXNet, outperforms both PyTorch and TensorFlow for the computer vision models using batch size 1, due to efficient GPU usage and thus low CPU usage. However, batch size 64 makes PyTorch and MXNet practically indistinguishable, while TensorFlow is outperformed consistently. For BERT, PyTorch exhibits the best performance. Converting the models to ONNX yields significant performance improvements in the majority of cases. Finally, in our preliminary investigation of execution providers, we observe that TensorRT always outperforms CUDA.

翻译：深度学习框架（如PyTorch和TensorFlow）包含负责在目标硬件上执行训练模型、管理内存、数据传输以及多加速器执行（如适用）的运行时基础设施。此外，将预训练模型部署到与其原生开发环境不同的环境中是一种常见做法。这促使了ONNX等交换格式及其运行时基础设施ONNX Runtime的引入，它们作为标准格式可用于不同深度学习框架和语言。尽管这些运行时基础设施对推理性能有重大影响，但尚无先前研究探讨其能效。在本研究中，我们使用三种不同的深度学习模型，监测三个知名深度学习框架及ONNX运行时基础设施的能耗和推理时间。为使研究更具层次性，我们还考察了使用不同执行提供商的影响。我们发现深度学习的性能和能效难以预测。对于使用批量大小为1的计算机视觉模型，MXNet框架因高效的GPU使用率及由此带来的低CPU使用率而优于PyTorch和TensorFlow。然而，当批量大小为64时，PyTorch与MXNet在实践中几乎无差别，而TensorFlow则持续表现较差。对于BERT模型，PyTorch展现出最佳性能。将模型转换为ONNX格式在多数情况下带来了显著的性能提升。最后，在对执行提供商的初步研究中，我们观察到TensorRT始终优于CUDA。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日