Consumer speech recognition systems do not work as well for many people with speech diferences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we frst address these questions using results from a 61-person survey from people who stutter and fnd participants want to use speech recognition but are frequently cut of, misunderstood, or speech predictions do not represent intent. In a second study, where 91 people who stutter recorded voice assistant commands and dictation, we quantify how dysfuencies impede performance in a consumer-grade speech recognition system. Through three technical investigations, we demonstrate how many common errors can be prevented, resulting in a system that cuts utterances of 79.1% less often and improves word error rate from 25.4% to 9.9%.
翻译:消费级语音识别系统对许多存在言语差异(如口吃)的用户而言,其表现不如普通人群。然而,这些系统的失效程度、改进方向以及用户的使用意愿尚不明确。本文首先通过针对61名口吃者的问卷调查,发现参与者希望使用语音识别,但频繁遭遇语音被截断、误解或预测结果无法反映其意图的情况。在第二项研究中,91名口吃者录制了语音助手指令与听写内容,我们量化了言语不流畅如何影响消费级语音识别系统的性能。通过三项技术探索,我们展示了如何避免常见错误,最终使系统的语音截断率降低79.1%,词错误率从25.4%降至9.9%。