Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Ximing Lu,Faeze Brahman,Peter West,Jaehun Jang,Khyathi Chandu,Abhilasha Ravichander,Lianhui Qin,Prithviraj Ammanabrolu,Liwei Jiang,Sahana Ramnath,Nouha Dziri,Jillian Fisher,Bill Yuchen Lin,Skyler Hallinan,Xiang Ren,Sean Welleck,Yejin Choi

Large language models excel at a variety of language tasks when prompted with examples or instructions. Yet controlling these models through prompting alone is limited. Tailoring language models through fine-tuning (e.g., via reinforcement learning) can be effective, but it is expensive and requires model access. We propose Inference-time Policy Adapters (IPA), which efficiently tailors a language model such as GPT-3 without fine-tuning it. IPA guides a large base model during decoding time through a lightweight policy adaptor trained to optimize an arbitrary user objective with reinforcement learning. On five challenging text generation tasks, such as toxicity reduction and open-domain generation, IPA consistently brings significant improvements over off-the-shelf language models. It outperforms competitive baseline methods, sometimes even including expensive fine-tuning. In particular, tailoring GPT-2 with IPA can outperform GPT-3, while tailoring GPT- 3 with IPA brings a major performance boost over GPT-3 (and sometimes even over GPT-4). Our promising results highlight the potential of IPA as a lightweight alternative to tailoring extreme-scale language models.

翻译：大型语言模型在通过示例或指令提示时，能出色完成多种语言任务。然而，仅通过提示来控制这些模型存在局限性。通过微调（如强化学习）来定制语言模型虽有效，但成本高昂且需要模型访问权限。我们提出推理时策略适配器（IPA），它能在无需微调的情况下高效定制诸如GPT-3等语言模型。IPA通过在解码阶段引入一个轻量级策略适配器来引导大型基础模型，该适配器通过强化学习针对任意用户目标进行训练。在毒性降低和开放域生成等五个具有挑战性的文本生成任务中，IPA相较于现成语言模型持续带来显著改进。它优于竞争性基线方法，有时甚至包括昂贵的微调。特别地，使用IPA定制GPT-2可超越GPT-3，而使用IPA定制GPT-3则在性能上较GPT-3有大幅提升（有时甚至超越GPT-4）。我们的这些令人鼓舞的结果凸显了IPA作为定制超大规模语言模型轻量级替代方案的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日