Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to GPT-4o API calls, it reduces token usage by approximately 60\%, further improving cost efficiency. These findings indicate the potential of confidence-based model selection to enhance real-world LLM deployment, particularly in resource-constrained settings such as edge devices and commercial API applications.
翻译:大型语言模型(LLM)已彻底改变了各类自然语言任务的推理过程,模型规模越大性能越好,但计算成本也越高。我们提出一种置信度驱动策略,该策略基于置信度估计动态选择最合适的模型。通过评估模型处理任务的置信度及其响应的准确性,该方法保留可能被正确解决的任务,而将更不确定或更复杂的案例委托给更大的模型,从而在确保可靠性的同时最小化计算开销。具体而言,我们评估模型知晓正确答案的可能性及其响应准确的概率。在Massive Multitask Language Understanding(MMLU)基准测试上的实验表明,我们的方法在将计算成本降低20%至40%的同时,达到了与最大模型相当的准确率。当应用于GPT-4o API调用时,该方法将令牌使用量减少了约60%,进一步提升了成本效益。这些发现表明,基于置信度的模型选择有潜力增强现实世界中的LLM部署,特别是在资源受限的环境中,如边缘设备和商业API应用。