Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to Automix are two key technical contributions. First, it has a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring extensive training. Second, given that self-verification can be noisy, it employs a POMDP based router that can effectively select an appropriately sized model, based on answer confidence. Experiments across five language models and five challenging datasets show that Automix consistently surpasses strong baselines, reducing computational cost by over 50% for comparable performance.
翻译:大型语言模型(LLM)目前可通过云API提供商以多种规模和配置获得。尽管这种多样性提供了广泛的选择范围,但如何有效利用这些选项以优化计算成本与性能仍具挑战性。本研究提出AutoMix方法,其核心策略是基于较小语言模型输出的近似正确性,将查询智能路由至较大语言模型。AutoMix包含两项关键技术贡献:首先,它采用少样本自验证机制,无需大量训练即可评估自身输出的可靠性;其次,针对自验证可能存在的噪声问题,设计了基于POMDP的路由器,能够依据答案置信度有效选择合适规模的模型。在五种语言模型和五个挑战性数据集上的实验表明,AutoMix始终优于强基线方法,在保持相当性能的同时降低超过50%的计算成本。