Many consumer speech recognition systems are not tuned for people with speech disabilities, resulting in poor recognition and user experience, especially for severe speech differences. Recent studies have emphasized interest in personalized speech models from people with atypical speech patterns. We propose a query-by-example-based personalized phrase recognition system that is trained using small amounts of speech, is language agnostic, does not assume a traditional pronunciation lexicon, and generalizes well across speech difference severities. On an internal dataset collected from 32 people with dysarthria, this approach works regardless of severity and shows a 60% improvement in recall relative to a commercial speech recognition system. On the public EasyCall dataset of dysarthric speech, our approach improves accuracy by 30.5%. Performance degrades as the number of phrases increases, but consistently outperforms ASR systems when trained with 50 unique phrases.
翻译:许多消费级语音识别系统并未针对言语障碍人士进行优化,导致识别效果不佳及用户体验差,尤其在重度言语差异情况下更为明显。近期研究强调了对非典型言语模式人群个性化语音模型的兴趣。我们提出一种基于查询示例的个性化短语识别系统,该系统使用少量语音进行训练,具有语言无关性,不依赖传统发音词典,并能很好地泛化至不同严重程度的言语差异。在从32名构音障碍患者收集的内部数据集上,该方法无视严重程度均可有效工作,相较商业语音识别系统召回率提升60%。在公开的构音障碍语音数据集EasyCall上,该方法准确率提升30.5%。随着短语数量增加,性能有所下降,但在训练50个独特短语时始终优于传统自动语音识别系统。