Large Language Models (LLMs) have become widely used across various domains spanning search engines, code generation, and text creation. However, a major concern associated with their adoption is the high cost of inference, impacting both their sustainability and financial feasibility. In this study, we empirically study how different prompt and response characteristics directly impact LLM inference energy cost. We conduct experiments leveraging three open-source transformer-based LLMs across three task types$-$question answering, sentiment analysis, and text generation. For each inference, we analyzed prompt and response characteristics (length, semantic meaning, time taken, energy consumption). Our results demonstrate that even when presented with identical tasks, models generate responses with varying characteristics and subsequently exhibit distinct energy consumption patterns. We found that prompt length is less significant than the semantic meaning of the task itself. In addition, we identified specific keywords associated with higher or lower energy usage that vary between associated tasks. These findings highlight the importance of prompt design in optimizing inference efficiency. We conclude that the semantic meaning of prompts and certain task-related keywords significantly impact inference costs, leading the way for deeper exploration towards creating energy-adaptive LLMs.
翻译:大语言模型(LLM)已广泛应用于搜索引擎、代码生成和文本创作等多个领域。然而,其采用过程中一个主要问题是推理成本高昂,这既影响了可持续性也影响了经济可行性。本研究通过实验探究不同提示和响应的特征如何直接影响LLM推理的能耗。我们利用三个基于Transformer的开源LLM,在问答、情感分析和文本生成三类任务上进行了实验。对于每次推理,我们分析了提示和响应的特征(长度、语义含义、耗时、能耗)。结果表明,即使面对相同的任务,模型生成的响应也会呈现不同特征,进而表现出不同的能耗模式。我们发现提示长度的重要性低于任务本身的语义含义。此外,我们识别出与任务相关的能耗高低变化的关键词。这些发现突显了提示设计在优化推理效率中的重要性。我们得出结论:提示的语义含义和特定任务相关关键词显著影响推理成本,这为探索创建能耗自适应LLM开辟了道路。