Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks for speech-to-text tasks. They are designed for different purposes and each has its own benefits and drawbacks for speech-to-text tasks. In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks. The new method leverages AED's strength in non-monotonic sequence to sequence learning while retaining Transducer's streaming property. In the proposed framework, Transducer and AED share the same speech encoder. The predictor in Transducer is replaced by the decoder in the AED model, and the outputs of the decoder are conditioned on the speech inputs instead of outputs from an unconditioned language model. The proposed solution ensures that the model is optimized by covering all possible read/write scenarios and creates a matched environment for streaming applications. We evaluate the proposed approach on the \textsc{MuST-C} dataset and the findings demonstrate that TAED performs significantly better than Transducer for offline automatic speech recognition (ASR) and speech-to-text translation (ST) tasks. In the streaming case, TAED outperforms Transducer in the ASR task and one ST direction while comparable results are achieved in another translation direction.

翻译：换能器（Transducer）与基于注意力的编码器-解码器（AED）是语音到文本任务中广泛使用的两种框架。它们针对不同目标设计，在语音到文本任务中各有优劣。为融合两种建模方法的优势，我们提出一种结合换能器与基于注意力的编码器-解码器（TAED）的解决方案，用于语音到文本任务。新方法利用AED在非单调序列到序列学习中的优势，同时保留换能器的流式特性。在所提框架中，换能器与AED共享相同的语音编码器。换能器中的预测器被AED模型中的解码器替代，且解码器的输出以语音输入为条件，而非来自无条件语言模型的输出。该方案确保模型通过覆盖所有可能的读/写场景进行优化，并为流式应用创建了匹配环境。我们在MuST-C数据集上评估了所提方法，实验结果表明，在离线自动语音识别（ASR）和语音到文本翻译（ST）任务中，TAED的性能显著优于换能器。在流式场景下，TAED在ASR任务和一个ST方向上超越换能器，而在另一翻译方向上取得了可比较的结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NLP还能做什么？北航、ETH、港科大、中科院等多机构联合发布百页论文，系统阐述后ChatGPT技术链

专知会员服务

91+阅读 · 2023年6月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日