Large Language Models (LLMs) have sparked significant interest in their generative capabilities, leading to the development of various commercial applications. The high cost of using the models drives application builders to maximize the value of generation under a limited inference budget. This paper presents a study of optimizing inference hyperparameters such as the number of responses, temperature and max tokens, which significantly affects the utility/cost of text generation. We design a framework named EcoOptiGen which leverages economical hyperparameter optimization and cost-based pruning. Experiments with the GPT-3.5/GPT-4 models on a variety of tasks verify its effectiveness. EcoOptiGen is implemented in the `autogen' package of the FLAML library: \url{https://aka.ms/autogen}.
翻译:大型语言模型(LLMs)因其生成能力引发了广泛关注,推动了各种商业应用的发展。使用这些模型的高昂成本促使应用开发者力求在有限推理预算内最大化生成价值。本文研究了推理超参数(如响应数量、温度参数和最大令牌数)的优化问题,这些参数显著影响文本生成的效用与成本。我们设计了一个名为EcoOptiGen的框架,该框架利用经济高效的超参数优化和基于成本的剪枝策略。在GPT-3.5/GPT-4模型上进行的多种任务实验验证了其有效性。EcoOptiGen已在FLAML库的'autogen'包中实现:\url{https://aka.ms/autogen}。