Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

With the rapid advancement of large language models (LLMs), the diversity of multi-LLM tasks and the variability in their pricing structures have become increasingly important, as costs can vary greatly between different LLMs. To tackle these challenges, we introduce the \textit{C2MAB-V}, a \underline{C}ost-effective \underline{C}ombinatorial \underline{M}ulti-armed \underline{B}andit with \underline{V}ersatile reward models for optimal LLM selection and usage. This online model differs from traditional static approaches or those reliant on a single LLM without cost consideration. With multiple LLMs deployed on a scheduling cloud and a local server dedicated to handling user queries, \textit{C2MAB-V} facilitates the selection of multiple LLMs over a combinatorial search space, specifically tailored for various collaborative task types with different reward models. Based on our designed online feedback mechanism and confidence bound technique, \textit{C2MAB-V} can effectively address the multi-LLM selection challenge by managing the exploration-exploitation trade-off across different models, while also balancing cost and reward for diverse tasks. The NP-hard integer linear programming problem for selecting multiple LLMs with trade-off dilemmas is addressed by: i) decomposing the integer problem into a relaxed form by the local server, ii) utilizing a discretization rounding scheme that provides optimal LLM combinations by the scheduling cloud, and iii) continual online updates based on feedback. Theoretically, we prove that \textit{C2MAB-V} offers strict guarantees over versatile reward models, matching state-of-the-art results for regret and violations in some degenerate cases. Empirically, we show that \textit{C2MAB-V} effectively balances performance and cost-efficiency with nine LLMs for three application scenarios.

翻译：随着大语言模型（LLM）的快速发展，多LLM任务的多样性及其定价结构的可变性变得日益重要，因为不同LLM之间的成本差异可能很大。为应对这些挑战，我们提出了\textit{C2MAB-V}，一种具有通用奖励模型的成本效益组合多臂老虎机，用于实现最优的LLM选择与使用。该在线模型不同于传统的静态方法或那些不考虑成本、仅依赖单一LLM的方法。通过将多个LLM部署在调度云上，并利用本地服务器专门处理用户查询，\textit{C2MAB-V}能够在组合搜索空间中选择多个LLM，特别适用于具有不同奖励模型的各种协作任务类型。基于我们设计的在线反馈机制和置信区间技术，\textit{C2MAB-V}能够通过管理不同模型间的探索-利用权衡，有效应对多LLM选择挑战，同时为多样化的任务平衡成本与奖励。对于存在权衡困境的多LLM选择这一NP难整数线性规划问题，我们通过以下方式解决：i) 由本地服务器将整数问题分解为松弛形式，ii) 利用调度云提供的离散化舍入方案来获得最优LLM组合，以及iii) 基于反馈进行持续的在线更新。理论上，我们证明了\textit{C2MAB-V}为通用奖励模型提供了严格的理论保证，在某些退化情形下其遗憾和约束违反的界与现有最优结果相匹配。实证上，我们展示了\textit{C2MAB-V}在三种应用场景中，使用九个LLM有效地平衡了性能与成本效益。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日