The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use. Yet, these energy costs remain largely opaque to users, especially when models are accessed through an API -- a black box in which all information depends on what providers choose to disclose. In this work, we investigate inference time measurements as a proxy to approximate the associated energy costs of API-based LLMs. We ground our approach by comparing our estimations with actual energy measurements from locally hosted equivalents. Our results show that time measurements allow us to infer GPU models for API-based LLMs, grounding our energy cost estimations. Our work aims to create means for understanding the associated energy costs of API-based LLMs, especially for end users.
翻译:大型语言模型(LLMs)的能源消耗因其对环境稳定性和资源利用的负面影响而日益引发关注。然而,这些能耗成本对用户而言仍极不透明,特别是当模型通过API接口调用时——这种黑箱系统的所有信息完全取决于服务提供商选择披露的内容。本研究通过推理时间测量来近似估算基于API的大型语言模型的能耗成本。我们将本地部署等效模型的实际能耗测量值与我们提出的估算方法进行对比验证。实验结果表明,时间测量能够有效推断基于API的大型语言模型所采用的GPU型号,从而为能耗估算提供实证依据。本研究旨在建立理解基于API的大型语言模型能耗成本的方法体系,尤其为终端用户提供评估依据。