Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins. We collect a benchmark of 13 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP), demonstrate that CBERTScore more closely matches what clinicians prefer, and release the benchmark for the community to further develop clinically-aware ASR metrics.
翻译:自动语音识别(ASR)技术在医疗环境中有望节省时间、降低费用、提高报告准确率并减少医生职业倦怠。然而,医疗行业对该技术的采用速度相对较慢,部分原因在于避免医学相关转录错误的重要性。本文提出临床BERTScore(CBERTScore),这是一种能够对临床相关错误给予更高惩罚的ASR评估指标。我们证明,相较于其他评估指标(如WER、BLEU、METEOR等),该指标在医学语句上与临床医生偏好更为一致,有时差异显著。我们收集了包含13位临床医生对149条真实医学语句的偏好基准数据集,称为临床医生转录偏好基准(CTP),证明CBERTScore能更准确地匹配临床医生偏好,并公开该基准数据集以供学术界进一步开发临床感知型ASR评估指标。