Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present AutoMix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to AutoMix is a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring training. Given that verifications can be noisy, we employ a meta-verifier in AutoMix to refine the accuracy of these assessments. Our experiments using LLAMA2-13B and GPT-4, on five context-grounded reasoning datasets demonstrate that AutoMix surpasses established baselines, improving the incremental benefit per cost by up to 86%. Our code and data are available at https://github.com/automix-llm/automix.
翻译:大型语言模型(LLM)现可通过云API提供商获得,具有多种规模和配置。尽管这种多样性提供了广泛的选择,但如何有效利用这些选项以优化计算成本和性能仍具挑战性。本文提出AutoMix方法,该方法基于较小LM输出的近似正确性,策略性地将查询路由至较大LM。AutoMix的核心是一种少样本自验证机制,无需训练即可评估自身输出的可靠性。考虑到验证可能存在噪声,我们在AutoMix中采用元验证器以提升评估准确性。我们在五个基于文本推理的数据集上,使用LLAMA2-13B和GPT-4进行的实验表明,AutoMix超越已有基线方法,每单位成本的增量收益最高提升86%。我们的代码与数据已在https://github.com/automix-llm/automix 公开。