Analyzing the Energy and Accuracy of LLMs in Software Development

The use of generative AI-based coding assistants like ChatGPT and Github Copilot is a reality in contemporary software development. Many of these tools are provided as remote APIs. Using third-party APIs raises data privacy and security concerns for client companies, which motivates the use of locally-deployed language models. In this study, we explore the trade-off between model accuracy and energy consumption, aiming to provide valuable insights to help developers make informed decisions when selecting a language model. We investigate the performance of 18 families of LLMs in typical software development tasks on two real-world infrastructures, a commodity GPU and a powerful AI-specific GPU. Given that deploying LLMs locally requires powerful infrastructure which might not be affordable for everyone, we consider both full-precision and quantized models. Our findings reveal that employing a big LLM with a higher energy budget does not always translate to significantly improved accuracy. Additionally, quantized versions of large models generally offer better efficiency and accuracy compared to full-precision versions of medium-sized ones. Apart from that, not a single model is suitable for all types of software development tasks.

翻译：基于生成式人工智能的代码助手（如ChatGPT和Github Copilot）在当代软件开发中已成为现实。许多此类工具以远程API形式提供。使用第三方API会引发客户公司的数据隐私与安全问题，这促使了本地部署语言模型的需求。本研究探讨模型准确性与能耗之间的权衡关系，旨在为开发者在选择语言模型时提供有价值的参考依据。我们在两种真实基础设施（商用GPU与专用AI高性能GPU）上，针对典型软件开发任务评估了18个系列大型语言模型的性能。考虑到本地部署LLM需要昂贵的基础设施且非所有用户都能负担，我们同时考察了全精度模型与量化模型。研究结果表明：采用高能耗预算的大型LLM并不总能带来准确性的显著提升；此外，大型模型的量化版本通常比中等规模模型的全精度版本具有更优的能效与准确性。值得注意的是，没有任何单一模型能适用于所有类型的软件开发任务。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日