Streaming Joint Speech Recognition and Disfluency Detection

Disfluency detection has mainly been solved in a pipeline approach, as post-processing of speech recognition. In this study, we propose Transformer-based encoder-decoder models that jointly solve speech recognition and disfluency detection, which work in a streaming manner. Compared to pipeline approaches, the joint models can leverage acoustic information that makes disfluency detection robust to recognition errors and provide non-verbal clues. Moreover, joint modeling results in low-latency and lightweight inference. We investigate two joint model variants for streaming disfluency detection: a transcript-enriched model and a multi-task model. The transcript-enriched model is trained on text with special tags indicating the starting and ending points of the disfluent part. However, it has problems with latency and standard language model adaptation, which arise from the additional disfluency tags. We propose a multi-task model to solve such problems, which has two output layers at the Transformer decoder; one for speech recognition and the other for disfluency detection. It is modeled to be conditioned on the currently recognized token with an additional token-dependency mechanism. We show that the proposed joint models outperformed a BERT-based pipeline approach in both accuracy and latency, on both the Switchboard and the corpus of spontaneous Japanese.

翻译：不流畅检测通常采用流水线方法，作为语音识别的后处理步骤。在本研究中，我们提出基于Transformer的编码器-解码器模型，该模型能够以流式方式联合解决语音识别和不流畅检测问题。与流水线方法相比，联合模型可以利用声学信息，使不流畅检测对识别错误具有鲁棒性，并提供非言语线索。此外，联合建模可实现低延迟和轻量级推理。我们研究了两种用于流式不流畅检测的联合模型变体：转录增强模型和多任务模型。转录增强模型使用包含特殊标签的文本进行训练，这些标签指示不流畅部分的起点和终点。然而，该模型存在延迟和标准语言模型适应性问题，这些问题源于额外的不流畅标签。为了解决这些问题，我们提出了一种多任务模型，该模型在Transformer解码器处具有两个输出层：一个用于语音识别，另一个用于不流畅检测。该模型通过额外的令牌依赖机制，使其条件化于当前识别的令牌。我们证明，所提出的联合模型在Switchboard和自发性日语语料库上，在准确性和延迟方面均优于基于BERT的流水线方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日