Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?

A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where a software scores the responses entered by the listener. The test has 24 digit-triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19-64 years) completed one DIN+Kaldi-NL test. Their spoken responses were recorded, and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0 - 48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analysed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications.

翻译：一种实用的语音测听工具是数字噪音测试（DIN），适用于不同年龄和听力状况人群的听力筛查。该测试通常由人类监督员（如临床医生）进行评分，评估受试者口头给出的回答；或通过在线方式由软件对受试者输入的回答进行评分。测试包含24个数字三连音，采用自适应阶梯式流程，最终得出言语接受阈（SRT）。我们提出一种替代性自动化DIN测试方案，利用开源自动语音识别工具包Kaldi-NL，无需人类监督员即可评估口头回答。30名自我报告听力正常的荷兰成年人（19-64岁）完成了一次DIN+Kaldi-NL测试。其口头回答被录音并用于评估Kaldi-NL解码转录文本的准确性。研究1通过词错误率（WER）评估Kaldi-NL性能，该指标计算转录文本中仅针对数字的解码错误总和占口述回答中数字总数的百分比。参与者的平均WER为5.0%（范围0-48%，标准差8.8%），每位参与者平均出现三个三连音的解码错误。研究2通过自助抽样模拟分析Kaldi-NL解码错误的三个三连音对DIN测试输出（SRT）的影响。既往研究表明听力正常成年人的受试者内SRT典型变异度为0.70分贝。研究2显示，最多四个含解码错误的三连音即可使SRT变异落在此范围内，表明我们提出的方案在临床应用上具有可行性。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日