The carbon footprint associated with large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes, including operational and embodied carbon emissions. An essential aspect is accurately estimating the carbon impact of emerging LLMs even before their training, which heavily relies on GPU usage. Existing studies have reported the carbon footprint of LLM training, but only one tool, mlco2, can predict the carbon footprint of new neural networks prior to physical training. However, mlco2 has several serious limitations. It cannot extend its estimation to dense or mixture-of-experts (MoE) LLMs, disregards critical architectural parameters, focuses solely on GPUs, and cannot model embodied carbon footprints. Addressing these gaps, we introduce \textit{LLMCarbon}, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs. Compared to mlco2, LLMCarbon significantly enhances the accuracy of carbon footprint estimations for various LLMs.
翻译:大型语言模型(LLM)的碳足迹是一个重要问题,涵盖训练、推理、实验和存储过程中产生的排放,包括运营碳和隐含碳排放。一个关键方面是在训练前准确估算新兴LLM的碳影响,这高度依赖于GPU的使用。现有研究已报道了LLM训练的碳足迹,但只有mlco2这一工具能够在物理训练前预测新神经网络的碳足迹。然而,mlco2存在若干严重局限性:无法将估算扩展至密集或混合专家(MoE)LLM,忽略了关键架构参数,仅关注GPU,且无法建模隐含碳足迹。针对这些不足,我们提出\textit{LLMCarbon}——一种面向密集和MoE LLM的端到端碳足迹预测模型。与mlco2相比,LLMCarbon显著提升了多种LLM碳足迹估算的准确性。