FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics

Despite the massive success of fine-tuning Pre-trained Language Models (PLMs), they remain susceptible to out-of-distribution input. Dataset cartography is a simple yet effective dual-model approach that improves the robustness of fine-tuned PLMs. It involves fine-tuning a model on the original training set (i.e. reference model), selecting a subset of important training instances based on the training dynamics, and fine-tuning again only on these selected examples (i.e. main model). However, this approach requires fine-tuning the same model twice, which is computationally expensive for large PLMs. In this paper, we show that (1) training dynamics are highly transferable across model sizes and pre-training methods, and that (2) fine-tuning main models using these selected training instances achieves higher training efficiency than empirical risk minimization (ERM). Building on these observations, we propose a novel fine-tuning approach: Fine-Tuning by transFerring Training dynamics (FTFT). Compared with dataset cartography, FTFT uses more efficient reference models and aggressive early stopping. FTFT achieves robustness improvements over ERM while lowering the training cost by up to $\sim 50\%$.

翻译：尽管对预训练语言模型（PLM）进行微调已取得巨大成功，但其仍易受分布外输入的影响。数据集制图是一种简单而有效的双模型方法，可提升微调后PLM的鲁棒性。该方法首先在原始训练集上微调模型（即参考模型），根据训练动态选择重要的训练实例子集，随后仅在这些选定样本上再次微调模型（即主模型）。然而，该方法需要对同一模型进行两次微调，对于大型PLM而言计算成本高昂。本文研究表明：（1）训练动态在不同模型规模与预训练方法间具有高度可迁移性；（2）使用这些选定训练实例微调主模型，比经验风险最小化（ERM）具有更高的训练效率。基于这些发现，我们提出一种新颖的微调方法：通过迁移训练动态进行微调（FTFT）。相较于数据集制图法，FTFT采用更高效的参考模型与激进的早停策略。该方法在实现优于ERM的鲁棒性提升的同时，将训练成本降低约50%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日