Large language models (LLMs) have revolutionized the landscape of Natural Language Processing systems, but are computationally expensive. To reduce the cost without sacrificing performance, previous studies have explored various approaches to harness the potential of Small Language Models (SLMs) as cost-effective alternatives to their larger counterparts. Driven by findings that SLMs and LLMs exhibit complementary strengths in a structured knowledge extraction task, this work presents a novel SLM/LLM routing framework designed to improve computational efficiency and enhance task performance. First, exemplar pools are created to represent the types of contexts where each LM provides a more reliable answer, leveraging a sentence embedding fine-tuned so that context similarity is close to dialogue state similarity. Then, during inference, the k-nearest exemplars to the testing instance are retrieved, and the instance is routed according to majority vote. In dialogue state tracking tasks, the proposed routing framework enhances performance substantially compared to relying solely on LLMs, while reducing the computational costs by over 50%.
翻译:大语言模型(LLMs)彻底改变了自然语言处理系统的格局,但其计算成本高昂。为在不牺牲性能的前提下降低成本,先前研究已探索多种方法,利用小语言模型(SLMs)作为大模型经济高效的替代方案。基于SLMs与LLMs在结构化知识提取任务中表现出互补优势的发现,本文提出一种新型SLM/LLM路由框架,旨在提升计算效率并增强任务性能。首先,通过微调句子嵌入使上下文相似度接近对话状态相似度,构建代表各类上下文情境的示例池,其中不同语言模型能提供更可靠的答案。随后,在推理阶段检索与测试实例最相似的k个近邻示例,并根据多数投票结果对实例进行路由分配。在对话状态跟踪任务中,该路由框架相比仅依赖LLMs显著提升了性能,同时将计算成本降低了50%以上。