Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)
翻译:构音障碍是中风患者的常见问题,严重损害言语清晰度。不当停顿是严重程度评估和言语语言治疗中的关键指标。我们提出扩展大规模语音识别模型,用于检测构音障碍言语中的不当停顿。为此,我们设计了任务方案、标注策略以及带有不当停顿预测层的语音识别模型。首先,我们将停顿检测视为语音识别任务,利用自动语音识别(ASR)模型将语音转换为带停顿标签的文本。根据新设计的任务,我们在文本层面标注停顿位置及其适当性。我们与言语语言病理学家合作制定标注准则,确保注释数据的高质量。最终,我们通过添加不当停顿预测层扩展ASR模型,实现端到端的不当停顿检测。此外,我们提出面向任务的自定义评估指标,用于独立于ASR性能的不当停顿检测评估。实验表明,与基线方法相比,所提方法能更有效地检测构音障碍言语中的不当停顿(不当停顿错误率:14.47%)。