Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressing the cost-effective query allocation problem for LLMs. Given a set of input queries and candidate LLMs, our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences, including options for maximizing accuracy and minimizing cost. OptLLM predicts the performance of candidate LLMs on each query using a multi-label classification model with uncertainty estimation and then iteratively generates a set of non-dominated solutions by destructing and reconstructing the current solution. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing. Our experimental results demonstrate that OptLLM substantially reduces costs by 2.40% to 49.18% while achieving the same accuracy as the best LLM. Compared to other multi-objective optimization algorithms, OptLLM improves accuracy by 2.94% to 69.05% at the same cost or saves costs by 8.79% and 95.87% while maintaining the highest attainable accuracy.
翻译:大型语言模型(LLMs)因其卓越能力受到广泛关注,越来越多的公司开始提供LLM服务。不同LLM在成本与性能上存在差异,用户面临的挑战在于如何选择最符合自身需求、平衡成本与性能的LLM。本文提出一个解决LLM经济高效查询分配问题的框架。给定输入查询集与候选LLM集合,我们的框架OptLLM为用户提供一系列符合预算约束与性能偏好的最优解选择方案,包括最大化准确率与最小化成本的选项。OptLLM通过具有不确定性估计的多标签分类模型预测各候选LLM在每项查询上的性能,随后通过解构与重构当前解的迭代过程生成一组非支配解。为评估OptLLM的有效性,我们在文本分类、问答、情感分析、推理和日志解析等多类任务上开展大量实验。实验结果表明,在达到最佳LLM同等准确率的前提下,OptLLM将成本显著降低了2.40%至49.18%。与其他多目标优化算法相比,OptLLM在相同成本下将准确率提升了2.94%至69.05%,或在保持最高可达准确率的同时将成本降低了8.79%至95.87%。