Fine-tuning language models in a downstream task is the standard approach for many state-of-the-art methodologies in the field of NLP. However, when the distribution between the source task and target task drifts, \textit{e.g.}, conversational environments, these gains tend to be diminished. This article proposes a sequence of pre-training steps (a curriculum) guided by "data hacking" and grammar analysis that allows further gradual adaptation between pre-training distributions. In our experiments, we acquire a considerable improvement from our method compared to other known pre-training approaches for the MultiWoZ task.
翻译:在下游任务中对语言模型进行微调是自然语言处理领域众多最先进方法的常用范式。然而,当源任务与目标任务之间存在分布漂移时(例如对话环境),这些性能增益往往会减弱。本文提出了一种基于"数据劫持"与语法分析的预训练步骤序列(课程),旨在使预训练分布之间实现逐步适应。实验表明,与MultiWoZ任务上其他已知的预训练方法相比,我们的方法获得了显著的性能提升。