Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural language prompts for an LM, from whose output the solution can then be extracted. LM performance has consistently been increasing with model size - but so has the monetary cost of querying the ever larger models. Importantly, however, not all inputs are equally hard: some require larger LMs for obtaining a satisfactory solution, whereas for others smaller LMs suffice. Based on this fact, we design a framework for Cost-Effective Language Model Choice (CELMOC). Given a set of inputs and a set of candidate LMs, CELMOC judiciously assigns each input to an LM predicted to do well on the input according to a so-called meta-model, aiming to achieve high overall performance at low cost. The cost-performance trade-off can be flexibly tuned by the user. Options include, among others, maximizing total expected performance (or the number of processed inputs) while staying within a given cost budget, or minimizing total cost while processing all inputs. We evaluate CELMOC on 14 datasets covering five natural language tasks, using four candidate LMs of vastly different size and cost. With CELMOC, we match the performance of the largest available LM while achieving a cost reduction of 63%. Via our publicly available library, researchers as well as practitioners can thus save large amounts of money without sacrificing performance.
翻译:生成式语言模型已在数据科学领域无处不在。对于各种任务,输入可以表述为面向语言模型的自然语言提示,然后从模型输出中提取解决方案。语言模型的性能随着模型规模的增大而持续提升——但查询这些更大模型的货币成本也在不断攀升。然而,并非所有输入难度相同:有些输入需要更大规模的语言模型才能获得满意解,而其他输入使用较小模型即可胜任。基于这一事实,我们设计了一个经济高效的语言模型选择(CELMOC)框架。给定一组输入和一组候选语言模型,CELMOC根据所谓的元模型预测,审慎地将每个输入分配给预期表现良好的语言模型,旨在以低成本实现高整体性能。用户可灵活调整成本-性能权衡,可选方案包括:在给定成本预算内最大化总预期性能(或处理输入数量),或在处理所有输入时最小化总成本。我们在涵盖五项自然语言任务的14个数据集上,使用四种规模与成本差异显著的候选语言模型评估了CELMOC。采用CELMOC,我们实现了与最大可用语言模型相当的性能,同时成本降低63%。通过我们公开的代码库,研究人员和从业者可以在不牺牲性能的前提下节省大量资金。