ThriftLLM：面向分类查询的大语言模型成本效益选择研究 (ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries)

In recent years, large language models (LLMs) have demonstrated remarkable capabilities in comprehending and generating natural language content, attracting widespread attention in both industry and academia. An increasing number of services offer LLMs for various tasks via APIs. Different LLMs demonstrate expertise in different domains of queries (e.g., text classification queries). Meanwhile, LLMs of different scales, complexities, and performance are priced diversely. Driven by this, several researchers are investigating strategies for selecting an ensemble of LLMs, aiming to decrease overall usage costs while enhancing performance. However, to the best of our knowledge, none of the existing works addresses the problem, how to find an LLM ensemble subject to a cost budget, which maximizes the ensemble performance with guarantees. In this paper, we formalize the performance of an ensemble of models (LLMs) using the notion of correctness probability, which we formally define. We develop an approach for aggregating responses from multiple LLMs to enhance ensemble performance. Building on this, we formulate the Optimal Ensemble Selection problem of selecting a set of LLMs subject to a cost budget that maximizes the overall correctness probability. We show that the correctness probability function is non-decreasing and non-submodular and provide evidence that the Optimal Ensemble Selection problem is likely to be NP-hard. By leveraging a submodular function that upper bounds correctness probability, we develop an algorithm called ThriftLLM and prove that it achieves an instance-dependent approximation guarantee with high probability. Our framework functions as a data processing system that selects appropriate LLM operators to deliver high-quality results under budget constraints.

翻译：近年来，大语言模型（LLMs）在理解和生成自然语言内容方面展现出卓越能力，受到工业界和学术界的广泛关注。越来越多的服务通过API提供LLMs以处理各类任务。不同LLMs在不同查询领域（如文本分类查询）展现出专业优势。同时，不同规模、复杂度和性能的LLMs定价各异。受此驱动，已有研究者开始探索LLMs集成选择策略，旨在降低总体使用成本的同时提升性能。然而，据我们所知，现有研究尚未解决以下问题：如何在成本预算约束下寻找LLMs集成方案，以保证性能最大化为目标。本文通过形式化定义正确概率的概念，构建了模型（LLMs）集成性能的数学表征。我们开发了一种聚合多个LLMs响应的新方法以提升集成性能。在此基础上，我们构建了最优集成选择问题：在成本预算约束下选择LLMs集合以最大化整体正确概率。我们证明了正确概率函数具有非递减性和非子模性，并提供了该问题可能属于NP-hard问题的理论依据。通过构建一个上界正确概率的子模函数，我们开发了名为ThriftLLM的算法，并证明该算法能以高概率实现实例相关的近似保证。本框架可作为数据处理系统运行，在预算约束下选择合适的LLM算子以提供高质量结果。