4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

The network architecture of end-to-end (E2E) automatic speech recognition (ASR) can be classified into several models, including connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention mechanism, and non-autoregressive mask-predict models. Since each of these network architectures has pros and cons, a typical use case is to switch these separate models depending on the application requirement, resulting in the increased overhead of maintaining all models. Several methods for integrating two of these complementary models to mitigate the overhead issue have been proposed; however, if we integrate more models, we will further benefit from these complementary models and realize broader applications with a single system. This paper proposes four-decoder joint modeling (4D) of CTC, attention, RNN-T, and mask-predict, which has the following three advantages: 1) The four decoders are jointly trained so that they can be easily switched depending on the application scenarios. 2) Joint training may bring model regularization and improve the model robustness thanks to their complementary properties. 3) Novel one-pass joint decoding methods using CTC, attention, and RNN-T further improves the performance. The experimental results showed that the proposed model consistently reduced the WER.

翻译：端到端自动语音识别（E2E ASR）的网络架构可分为若干类模型，包括连接时序分类（CTC）、循环神经网络换能器（RNN-T）、注意力机制以及非自回归掩码预测模型。由于每种网络架构各有利弊，典型做法是根据应用需求切换这些独立模型，导致维护所有模型的开销增加。已有研究提出了多种集成其中两种互补模型以缓解开销问题的方法；然而，若集成更多模型，将能进一步利用这些互补模型的优势，并通过单一系统实现更广泛的应用。本文提出了CTC、注意力、RNN-T和掩码预测的四解码器联合建模（4D），具有以下三点优势：1）四个解码器联合训练，可根据应用场景轻松切换；2）联合训练可引入模型正则化，并因互补特性提升模型鲁棒性；3）利用CTC、注意力和RNN-T的新型单次联合解码方法进一步提升了性能。实验结果表明，所提模型持续降低了词错误率（WER）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/