Large language models (LLMs) are now available in various sizes and configurations from cloud API providers. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present AutoMix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to AutoMix is a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring training. Given that verifications can be noisy, we employ a meta verifier in AutoMix to refine the accuracy of these assessments. Our experiments using LLAMA2-13/70B, on five context-grounded reasoning datasets demonstrate that AutoMix surpasses established baselines, improving the incremental benefit per cost by up to 89%. Our code and data are available at https://github.com/automix-llm/automix.
翻译:大型语言模型(LLM)现可通过云端API提供商获取多种规模和配置的版本。尽管这种多样性提供了广泛的选择,但有效利用这些选项以优化计算成本与性能仍具挑战性。本文提出AutoMix方法,该方法基于较小语言模型输出的大致正确性,将查询策略性地路由至更大语言模型。AutoMix的核心是一种少样本自验证机制,可在无需训练的情况下评估自身输出的可靠性。鉴于验证过程可能包含噪声,我们采用元验证器来提升评估的准确性。在五个基于情境推理的数据集上使用LLAMA2-13/70B进行的实验表明,AutoMix超越了已有基线方法,每单位成本的增量收益提升最高达89%。我们的代码与数据已发布于https://github.com/automix-llm/automix。