Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural language prompts for an LM, from whose output the solution can then be extracted. LM performance has consistently been increasing with model size - but so has the monetary cost of querying the ever larger models. Importantly, however, not all inputs are equally hard: some require larger LMs for obtaining a satisfactory solution, whereas for others smaller LMs suffice. Based on this fact, we design a framework for Cost-Effective Language Model Choice (CELMOC). Given a set of inputs and a set of candidate LMs, CELMOC judiciously assigns each input to an LM predicted to do well on the input according to a so-called meta-model, aiming to achieve high overall performance at low cost. The cost-performance trade-off can be flexibly tuned by the user. Options include, among others, maximizing total expected performance (or the number of processed inputs) while staying within a given cost budget, or minimizing total cost while processing all inputs. We evaluate CELMOC on 14 datasets covering five natural language tasks, using four candidate LMs of vastly different size and cost. With CELMOC, we match the performance of the largest available LM while achieving a cost reduction of 63%. Via our publicly available library, researchers as well as practitioners can thus save large amounts of money without sacrificing performance.
翻译:生成式语言模型在数据科学领域已无处不在。针对各式各样的任务,输入可以表述为面向语言模型的自然语言提示,而后从其输出中提取解决方案。语言模型的性能随模型规模增大而持续提升——但查询这些规模日益增大的模型的货币成本也在同步攀升。然而,重要的是,并非所有输入难度相同:某些输入需要更大的语言模型才能获得满意的解决方案,而另一些输入则使用较小的语言模型即可。基于这一事实,我们设计了一个成本效益语言模型选择框架CELMOC。给定一组输入和一组候选语言模型,CELMOC依据所谓的元模型对每个输入进行预测,审慎地将该输入分配给预期能良好处理的模型,旨在以低成本实现高整体性能。成本-性能权衡可由用户灵活调整。选项包括但不限于:在给定成本预算内最大化总期望性能(或处理的输入数量),或在处理所有输入的同时最小化总成本。我们在涵盖五项自然语言任务的14个数据集上,使用四个规模和成本差异巨大的候选语言模型对CELMOC进行了评估。采用CELMOC,我们能够匹配最大可用语言模型的性能,同时实现63%的成本降低。通过我们公开可用的库,研究人员和从业者便能在不牺牲性能的情况下节省大量资金。