Text Alignment Is An Efficient Unified Model for Massive NLP Tasks

Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks, demanding an extreme scale of model parameters (10s or 100s of billions) and sometimes yielding suboptimal performance. In practice, it is often desirable to build more efficient models -- despite being less versatile, they still apply to a substantial subset of problems, delivering on par or even superior performance with much smaller model sizes. In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. Given a pair of texts, the model measures the degree of alignment between their information. We instantiate an alignment model (Align) through lightweight finetuning of RoBERTa (355M parameters) using 5.9M examples from 28 datasets. Despite its compact size, extensive experiments show the model's efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets; (2) When applied to evaluate factual consistency of language generation on 23 datasets, our model improves over various baselines, including the much larger GPT-3.5 (ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as an add-on component for LLMs such as GPT-3.5 in question answering tasks, improving the average exact match (EM) score by 17.94 and F1 score by 15.05 through identifying unanswerable questions.

翻译：大语言模型（LLMs）通常被设计为基于下一词预测的函数，在广泛的NLP任务中表现出色。然而，尽管具有通用性，下一词预测对许多任务而言并非高效范式，其需要极端规模的模型参数（数百亿甚至千亿级），且有时会导致次优性能。实践中，构建更高效的模型往往更具吸引力——尽管功能通用性较弱，但可适用于相当一部分问题，并以更小的模型尺寸实现相当甚至更优的性能。本文提出将文本对齐作为高效统一模型，适用于文本蕴含、相似度、问答（及可回答性）、事实一致性等多种关键任务。对于给定文本对，该模型衡量两者信息之间的对齐程度。我们通过对RoBERTa（3.55亿参数）进行轻量级微调，使用来自28个数据集的590万样本实例化了一个对齐模型（Align）。尽管模型紧凑，大量实验证明了其效率与强性能：（1）在上述多样化任务的20余个数据集上，该模型匹配或超越了参数规模约为其2倍或10倍的FLAN-T5模型；单一统一模型还优于针对单个数据集微调的任务特定模型；（2）在23个数据集上评估语言生成的事实一致性时，我们的模型优于包括更大规模的GPT-3.5（ChatGPT）甚至有时超越GPT-4在内的多种基线；（3）该轻量级模型还可作为LLMs（如GPT-3.5）的附加组件应用于问答任务，通过识别不可回答问题，将平均精确匹配（EM）分数提升17.94，F1分数提升15.05。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日