Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, develop a novel algorithm to align ASR predictions with these entities, and compute medical NE Recall, medical WER, and character error rate. Our results show that fine-tuning on accented clinical speech improves medical WER by a wide margin (25-34 % relative), improving their practical applicability in healthcare environments.
翻译:自动语音识别(ASR)技术的最新进展加速了其在医疗领域的应用,然而,ASR模型在带有口音的医疗命名实体(NE)(如药物名称、诊断结果和实验室指标)上的识别性能在很大程度上仍是未知的。我们在一个包含93种非洲口音的临床英语数据集上,对多个ASR模型进行了严格评估。分析表明,尽管部分模型实现了较低的整体词错误率(WER),但在临床实体上的错误率更高,这可能对患者安全构成重大风险。为实证这一点,我们从转录文本中提取临床实体,开发了一种新颖的算法将ASR预测与这些实体进行对齐,并计算了医疗命名实体召回率、医疗词错误率以及字符错误率。结果显示,在带口音的临床语音上进行微调,能大幅相对提升医疗词错误率(相对提升25-34%),从而增强了其在医疗保健环境中的实际适用性。