Despite the impressive performance of large language models (LLMs), they often lag behind specialized models in various tasks. LLMs only use a fraction of the existing training data for in-context learning, while task-specific models harness the full dataset for fine-tuning. In this work, we tackle the problem of leveraging training data to improve the performance of LLMs without fine-tuning. Our approach directly targets LLM predictions without requiring access to their weights. We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output. Our experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning. Furthermore, we illustrate the robustness of LMCor against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we show that LMCor can be seamlessly integrated with different LLMs at inference, serving as a plug-and-play module to improve their performance.
翻译:尽管大型语言模型(LLMs)性能卓越,但在各类任务中常落后于专用模型。LLMs仅利用现有训练数据的一小部分进行上下文学习,而任务专用模型则利用完整数据集进行微调。本文旨在解决无需微调即可利用训练数据提升LLM性能的问题。我们的方法直接针对LLM预测结果,无需访问其权重。通过少样本提示从LLM生成候选预测池,并训练一个轻量级模型——语言模型修正器(LMCor),专门用于融合这些候选以生成增强输出。在四个自然语言生成任务上的实验表明,即使采用小型LMCor模型(250M参数)也能显著提升LLM(62B参数)的少样本性能,其效果可与标准微调相当甚至更优。此外,我们证明了LMCor对不同提示具有鲁棒性,从而最大程度减少了对广泛提示工程的需求。最后,研究表明LMCor可在推理阶段与不同LLM无缝集成,作为即插即用模块提升其性能。