Embedding Recycling for Language Models

Real-world applications of neural language models often involve running many different models over the same corpus. The high computational cost of these runs has led to interest in techniques that can reuse the contextualized embeddings produced in previous runs to speed training and inference of future ones. We refer to this approach as embedding recycling (ER). While multiple ER techniques have been proposed, their practical effectiveness is still unknown because existing evaluations consider very few models and do not adequately account for overhead costs. We perform an extensive evaluation of ER across eight different models (17 to 900 million parameters) and fourteen tasks in English. We show how a simple ER technique that caches activations from an intermediate layer of a pretrained model, and learns task-specific adapters on the later layers, is broadly effective. For the best-performing baseline in our experiments (DeBERTa-v2 XL), adding a precomputed cache results in a >90% speedup during training and 87-91% speedup for inference, with negligible impact on accuracy. Our analysis reveals important areas of future work.

翻译：神经网络语言模型的实际应用通常需要在同一语料库上运行多种不同的模型。这些运行的高计算成本促使研究人员关注能够重复使用先前运行中生成的上下文嵌入的技术，以加速后续模型的训练与推理。我们将这种方法称为嵌入回收（ER）。尽管已有多种ER技术被提出，但其实际效果仍不明确，因为现有评估仅涉及极少数模型，且未充分考虑额外开销成本。我们针对八种不同模型（参数规模从1700万到9亿）及英语的十四项任务进行了广泛的ER性能评估。研究表明，一种简单的ER技术——缓存预训练模型中间层的激活值，并在后续层学习任务特定适配器——具有广泛有效性。在我们实验的最佳基线模型（DeBERTa-v2 XL）中，添加预计算缓存可使训练速度提升超过90%，推理速度提升87%-91%，且对准确率影响极小。我们的分析揭示了未来研究的重要方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日