There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.
翻译:随着可付费查询的大型语言模型(LLM)数量快速增长,本文系统梳理了主流LLM应用程序接口(如GPT-4、ChatGPT、J1-Jumbo)的调用成本,发现这些模型存在异构定价结构,费用差异可达两个数量级。特别地,在大规模查询与文本处理场景中,使用LLM的成本问题尤为突出。基于此,我们提出三类用户可用的推理成本优化策略:1)提示适配,2)LLM近似,3)LLM级联。以级联策略为例,我们提出FrugalGPT这一简洁灵活的实施方案,通过自适应学习不同查询对应的最优LLM组合,在降低开销的同时提升准确率。实验表明,FrugalGPT在匹配最佳单一LLM(如GPT-4)性能时最高可降低98%成本,或在同等成本下相较GPT-4提升4%准确率。本文提出的思路与发现为可持续高效使用LLM奠定了基础。