Conversational speech often consists of deviations from the speech plan, producing disfluent utterances that affect downstream NLP tasks. Removing these disfluencies is necessary to create fluent and coherent speech. This paper presents DisfluencyFixer, a tool that performs speech-to-speech disfluency correction in English and Hindi using a pipeline of Automatic Speech Recognition (ASR), Disfluency Correction (DC) and Text-To-Speech (TTS) models. Our proposed system removes disfluencies from input speech and returns fluent speech as output along with its transcript, disfluency type and total disfluency count in source utterance, providing a one-stop destination for language learners to improve the fluency of their speech. We evaluate the performance of our tool subjectively and receive scores of 4.26, 4.29 and 4.42 out of 5 in ASR performance, DC performance and ease-of-use of the system. Our tool can be accessed openly at the following link.
翻译:会话语音常包含偏离言语计划的表达,产生影响下游自然语言处理任务的不流畅语句。消除这些不流畅成分对于生成流畅连贯的语音至关重要。本文提出DisfluencyFixer工具,其通过自动语音识别(ASR)、不流畅修正(DC)和文本到语音(TTS)模型的流水线,实现英语和印地语的语音到语音不流畅修正。本系统可从输入语音中移除不流畅成分,返回流畅语音及其转录文本、不流畅类型及源语句中不流畅总数,为语言学习者提供提升语音流畅度的一站式解决方案。我们通过主观评估对工具性能进行评测,在ASR性能、DC性能和系统易用性三项指标上分别获得4.26分、4.29分和4.42分(满分5分)。该工具可通过以下链接公开访问。