Language model integration based on memory control for sequence to sequence speech recognition

In this paper, we explore several new schemes to train a seq2seq model to integrate a pre-trained LM. Our proposed fusion methods focus on the memory cell state and the hidden state in the seq2seq decoder long short-term memory (LSTM), and the memory cell state is updated by the LM unlike the prior studies. This means the memory retained by the main seq2seq would be adjusted by the external LM. These fusion methods have several variants depending on the architecture of this memory cell update and the use of memory cell and hidden states which directly affects the final label inference. We performed the experiments to show the effectiveness of the proposed methods in a mono-lingual ASR setup on the Librispeech corpus and in a transfer learning setup from a multilingual ASR (MLASR) base model to a low-resourced language. In Librispeech, our best model improved WER by 3.7%, 2.4% for test clean, test other relatively to the shallow fusion baseline, with multi-level decoding. In transfer learning from an MLASR base model to the IARPA Babel Swahili model, the best scheme improved the transferred model on eval set by 9.9%, 9.8% in CER, WER relatively to the 2-stage transfer baseline.

翻译：本文探索了多种训练序列到序列模型以集成预训练语言模型的新方案。所提出的融合方法聚焦于序列到序列解码器中长短期记忆网络（LSTM）的记忆单元状态与隐藏状态，且与先前研究不同之处在于通过语言模型更新记忆单元状态。这意味着主序列到序列模型保留的记忆将由外部语言模型进行调节。这些融合方法根据记忆单元更新的架构、以及直接影响最终标签推断的记忆单元与隐藏状态的使用方式，衍生出多种变体。我们通过实验验证了所提方法在单语自动语音识别场景（基于Librispeech语料库）以及从多语言自动语音识别基础模型向低资源语言迁移学习场景中的有效性。在Librispeech实验中，采用多层级解码策略时，我们最优模型相较于浅层融合基线，在test-clean和test-other数据集上的词错误率分别相对降低3.7%和2.4%。在从多语言语音识别基础模型向IARPA Babel斯瓦希里语模型迁移学习的实验中，最佳方案相较于两阶段迁移基线，在评估集上的字错误率与词错误率分别相对降低9.9%和9.8%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

KDD20 | 面向时态交互网络的数据驱动图生成模型

专知会员服务

24+阅读 · 2020年9月25日