A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where a software scores the responses entered by the listener. The test has 24 digit-triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19-64 years) completed one DIN+Kaldi-NL test. Their spoken responses were recorded, and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0 - 48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analysed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications.
翻译:一种实用的语音测听工具是数字噪音测试(DIN),适用于不同年龄和听力状况人群的听力筛查。该测试通常由人类监督员(如临床医生)进行评分,评估受试者口头给出的回答;或通过在线方式由软件对受试者输入的回答进行评分。测试包含24个数字三连音,采用自适应阶梯式流程,最终得出言语接受阈(SRT)。我们提出一种替代性自动化DIN测试方案,利用开源自动语音识别工具包Kaldi-NL,无需人类监督员即可评估口头回答。30名自我报告听力正常的荷兰成年人(19-64岁)完成了一次DIN+Kaldi-NL测试。其口头回答被录音并用于评估Kaldi-NL解码转录文本的准确性。研究1通过词错误率(WER)评估Kaldi-NL性能,该指标计算转录文本中仅针对数字的解码错误总和占口述回答中数字总数的百分比。参与者的平均WER为5.0%(范围0-48%,标准差8.8%),每位参与者平均出现三个三连音的解码错误。研究2通过自助抽样模拟分析Kaldi-NL解码错误的三个三连音对DIN测试输出(SRT)的影响。既往研究表明听力正常成年人的受试者内SRT典型变异度为0.70分贝。研究2显示,最多四个含解码错误的三连音即可使SRT变异落在此范围内,表明我们提出的方案在临床应用上具有可行性。