Large language models (LLMs) are now available in various sizes and configurations from cloud API providers. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present AutoMix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to AutoMix is a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring training. Given that verifications can be noisy, we employ a meta verifier in AutoMix to refine the accuracy of these assessments. Our experiments using LLAMA2-13/70B, on five context-grounded reasoning datasets demonstrate that AutoMix surpasses established baselines, improving the incremental benefit per cost by up to 89%. Our code and data are available at https://github.com/automix-llm/automix.
翻译:大型语言模型(LLMs)现以多种规模和配置从云端API提供商处获取。尽管这种多样性提供了广泛的选择,但有效利用这些选项来优化计算成本和性能仍具挑战性。本文提出AutoMix方法,该方法基于较小语言模型输出的近似正确性,策略性地将查询路由至较大语言模型。AutoMix的核心是一种少样本自验证机制,无需训练即可评估自身输出的可靠性。鉴于验证过程可能存在噪声,我们在AutoMix中采用元验证器来精化评估的准确性。我们在五个基于上下文的推理数据集上使用LLAMA2-13/70B进行的实验表明,AutoMix超越了现有基线方法,每单位成本的增量效益提升高达89%。我们的代码和数据已开源在https://github.com/automix-llm/automix。