Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

Continued self-supervised (SSL) pre-training for adapting existing SSL models to the target domain has shown to be extremely effective for low-resource Automatic Speech Recognition (ASR). This paper proposes Stable Distillation, a simple and novel approach for SSL-based continued pre-training that boosts ASR performance in the target domain where both labeled and unlabeled data are limited. Stable Distillation employs self-distillation as regularization for continued pre-training, alleviating the over-fitting issue, a common problem continued pre-training faces when the source and target domains differ. Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher. Next, we take the same initial pre-trained model as a student to perform continued pre-training while enforcing its hidden representations to be close to that of the teacher (via MSE loss). This student is then used for downstream ASR fine-tuning on the target dataset. In practice, Stable Distillation outperforms all our baselines by 0.8 - 7 WER when evaluated in various experimental settings.

翻译：持续的自监督预训练通过适配现有SSL模型至目标领域，已被证明对低资源自动语音识别（ASR）具有显著效果。本文提出稳定蒸馏（Stable Distillation），一种用于基于SSL的继续预训练的简洁新颖方法，可在标注数据与无标注数据均受限的目标领域提升ASR性能。该方法采用自蒸馏作为继续预训练的正则化手段，有效缓解了源领域与目标领域差异导致的过拟合问题——这是继续预训练中常见的困境。具体而言，首先我们在初始SSL预训练模型上对目标领域ASR数据集执行标准继续预训练，并将其称为教师模型；随后，将该初始预训练模型作为学生模型进行继续预训练，同时通过均方误差损失强制其隐藏表示逼近教师模型。此学生模型最终用于目标数据集上的下游ASR微调。实验表明，在不同设置下，稳定蒸馏在各项评估中均实现了比所有基线模型低0.8至7个词错误率（WER）的性能提升。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日