Consumer speech recognition systems do not work as well for many people with speech diferences, such as stuttering, relative to the rest of the general population. However, what is not clear is the degree to which these systems do not work, how they can be improved, or how much people want to use them. In this paper, we frst address these questions using results from a 61-person survey from people who stutter and fnd participants want to use speech recognition but are frequently cut of, misunderstood, or speech predictions do not represent intent. In a second study, where 91 people who stutter recorded voice assistant commands and dictation, we quantify how dysfuencies impede performance in a consumer-grade speech recognition system. Through three technical investigations, we demonstrate how many common errors can be prevented, resulting in a system that cuts utterances of 79.1% less often and improves word error rate from 25.4% to 9.9%.
翻译:消费级语音识别系统对许多存在言语差异(如口吃)的人群而言,其表现不如普通人群。然而,尚不清楚这些系统的失效程度、改进方法以及用户的使用意愿。本文首先通过一项针对61位口吃者的调查回答上述问题,发现参与者希望使用语音识别,但经常遭遇语音被截断、误解或语音预测无法代表真实意图。在第二项研究中,我们让91位口吃者录制语音助手指令和听写内容,量化了言语不流畅对消费级语音识别系统性能的影响。通过三项技术调查,我们展示了如何避免许多常见错误,从而使系统减少79.1%的语音截断,并将词错误率从25.4%降至9.9%。