With the rapid advancement of large language models (LLMs), the diversity of multi-LLM tasks and the variability in their pricing structures have become increasingly important, as costs can vary greatly between different LLMs. To tackle these challenges, we introduce the \textit{C2MAB-V}, a \underline{C}ost-effective \underline{C}ombinatorial \underline{M}ulti-armed \underline{B}andit with \underline{V}ersatile reward models for optimal LLM selection and usage. This online model differs from traditional static approaches or those reliant on a single LLM without cost consideration. With multiple LLMs deployed on a scheduling cloud and a local server dedicated to handling user queries, \textit{C2MAB-V} facilitates the selection of multiple LLMs over a combinatorial search space, specifically tailored for various collaborative task types with different reward models. Based on our designed online feedback mechanism and confidence bound technique, \textit{C2MAB-V} can effectively address the multi-LLM selection challenge by managing the exploration-exploitation trade-off across different models, while also balancing cost and reward for diverse tasks. The NP-hard integer linear programming problem for selecting multiple LLMs with trade-off dilemmas is addressed by: i) decomposing the integer problem into a relaxed form by the local server, ii) utilizing a discretization rounding scheme that provides optimal LLM combinations by the scheduling cloud, and iii) continual online updates based on feedback. Theoretically, we prove that \textit{C2MAB-V} offers strict guarantees over versatile reward models, matching state-of-the-art results for regret and violations in some degenerate cases. Empirically, we show that \textit{C2MAB-V} effectively balances performance and cost-efficiency with nine LLMs for three application scenarios.
翻译:随着大型语言模型(LLM)的快速发展,多LLM任务的多样性及其定价结构的可变性日益重要,因为不同LLM之间的成本差异可能非常显著。为应对这些挑战,我们提出了\textit{C2MAB-V},一种面向成本效益的组合多臂老虎机模型,配备多样化奖励模型,用于实现最优的LLM选择与使用。该在线模型不同于传统的静态方法或那些依赖单一LLM且不考虑成本的方案。通过将多个LLM部署于调度云端,并设置专门处理用户查询的本地服务器,\textit{C2MAB-V}能够在组合搜索空间中选择多个LLM,特别适用于具有不同奖励模型的各种协作任务类型。基于我们设计的在线反馈机制和置信区间技术,\textit{C2MAB-V}能够通过管理不同模型间的探索-利用权衡,有效应对多LLM选择挑战,同时为多样化任务平衡成本与奖励。针对存在权衡困境的多LLM选择这一NP难整数线性规划问题,我们通过以下方式解决:i) 由本地服务器将整数问题分解为松弛形式;ii) 利用调度云端的离散化舍入方案提供最优LLM组合;iii) 基于反馈进行持续在线更新。理论上,我们证明\textit{C2MAB-V}为多样化奖励模型提供了严格的理论保证,在某些退化情形下达到了关于遗憾和约束违反的最先进结果。实证方面,我们展示了\textit{C2MAB-V}在三种应用场景中,使用九个LLM有效平衡了性能与成本效益。