Learning paradigms for large language models (LLMs) currently tend to fall within either in-context learning (ICL) or full fine-tuning. Each of these comes with their own trade-offs based on available data, model size, compute cost, ease-of-use, and final quality with neither solution performing well across-the-board. In this article, we first describe ICL and fine-tuning paradigms in a way that highlights their natural connections. Based on these connections, we propose a new learning paradigm called FIAT that fuses the best of these paradigms together, enabling prompt-engineered instructions and chain-of-thought reasoning with the very largest models while also using similar methods to perform parameter updates on a modestly-sized LLM with parameter-efficient tuning. We evaluate FIAT's effectiveness on a variety of multilingual tasks and observe that FIAT performs better than both ICL and fine-tuning at scales ranging from 100-10,000 training examples. We hope that FIAT provides a practical way of harnessing the full potential of LLMs without needing to make a hard choice between learning paradigms.
翻译:摘要:目前,大型语言模型(LLM)的学习范式主要分为上下文学习(ICL)和全参数微调两种。这两种方法在可用数据、模型规模、计算成本、易用性和最终质量方面各有优劣,没有任何一种能在所有场景中表现卓越。本文首先通过揭示ICL与微调范式之间的自然联系,对二者进行系统阐述。基于这些联系,我们提出一种名为FIAT的新型学习范式,该范式融合了两者的最佳特性:既能对超大规模模型进行基于提示工程指令和思维链推理的操作,又能通过参数高效微调的方式对中等规模LLM执行参数更新。我们在一系列多语言任务上评估了FIAT的有效性,观察到在100至10,000个训练样本的规模范围内,FIAT的表现均优于ICL和微调。我们期待FIAT能为充分发挥LLM的全部潜力提供一种实用方法,而无需在两种学习范式之间做出艰难抉择。