FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online. However, continued pre-training on out-of-domain distributions often leads to catastrophic forgetting of previously acquired knowledge, leading to sub-optimal ASR performance. This paper presents FusDom, a simple and novel methodology for SSL-based continued pre-training. FusDom learns speech representations that are robust and adaptive yet not forgetful of concepts seen in the past. Instead of solving the SSL pre-text task on the output representations of a single model, FusDom leverages two identical pre-trained SSL models, a teacher and a student, with a modified pre-training head to solve the CP SSL pre-text task. This head employs a cross-attention mechanism between the representations of both models while only the student receives gradient updates and the teacher does not. Finally, the student is fine-tuned for ASR. In practice, FusDom outperforms all our baselines across settings significantly, with WER improvements in the range of 0.2 WER - 7.3 WER in the target domain while retaining the performance in the earlier domain.

翻译：摘要：持续预训练（CP）具有多重优势，例如目标领域适应能力以及利用在线可用未标注数据流的潜力。然而，在领域外分布上进行持续预训练往往会导致先前获得知识的灾难性遗忘，从而造成语音识别（ASR）性能次优。本文提出FusDom——一种基于自监督学习（SSL）的简单新颖持续预训练方法。FusDom学习的语音表示既鲁棒又具适应性，且不会遗忘先前遇到的概念。该方法并非在单个模型的输出表示上解决SSL前置任务，而是利用两个相同的预训练SSL模型（教师模型与学生模型），通过修改预训练头部来解决持续预训练中的SSL前置任务。该头部在两个模型的表示之间采用交叉注意力机制，仅学生模型接收梯度更新而教师模型不参与更新。最终，学生模型被微调用于ASR任务。实验表明，FusDom在所有设置下均显著优于基线，在目标领域上词错误率（WER）降低幅度达0.2-7.3个百分点，同时保留了先前领域的性能。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日