In the rapidly evolving landscape of medical documentation, transcribing clinical dialogues accurately is increasingly paramount. This study explores the potential of Large Language Models (LLMs) to enhance the accuracy of Automatic Speech Recognition (ASR) systems in medical transcription. Utilizing the PriMock57 dataset, which encompasses a diverse range of primary care consultations, we apply advanced LLMs to refine ASR-generated transcripts. Our research is multifaceted, focusing on improvements in general Word Error Rate (WER), Medical Concept WER (MC-WER) for the accurate transcription of essential medical terms, and speaker diarization accuracy. Additionally, we assess the role of LLM post-processing in improving semantic textual similarity, thereby preserving the contextual integrity of clinical dialogues. Through a series of experiments, we compare the efficacy of zero-shot and Chain-of-Thought (CoT) prompting techniques in enhancing diarization and correction accuracy. Our findings demonstrate that LLMs, particularly through CoT prompting, not only improve the diarization accuracy of existing ASR systems but also achieve state-of-the-art performance in this domain. This improvement extends to more accurately capturing medical concepts and enhancing the overall semantic coherence of the transcribed dialogues. These findings illustrate the dual role of LLMs in augmenting ASR outputs and independently excelling in transcription tasks, holding significant promise for transforming medical ASR systems and leading to more accurate and reliable patient records in healthcare settings.
翻译:在医疗文档快速发展的背景下,准确转录临床对话日益重要。本研究探讨利用大语言模型(LLMs)提升医疗转录中自动语音识别(ASR)系统准确性的潜力。基于涵盖多种初级保健咨询的PriMock57数据集,我们应用先进大语言模型优化ASR生成的转录文本。本研究的多元化评估聚焦于通用词错误率(WER)、医学概念词错误率(MC-WER)及说话人日志准确性的改进。同时,我们评估大语言模型后处理在提升语义文本相似度、保持临床对话语境完整性方面的作用。通过系列实验,我们比较了零样本和思维链(CoT)提示技术在提升日志与纠错准确性方面的效能。研究结果表明,大语言模型特别是通过思维链提示,不仅显著提升现有ASR系统的说话人日志准确性,更在该领域达到了最优性能。这种改进同时体现在更精准地捕捉医学概念与增强转录对话的整体语义连贯性上。这些发现揭示了大语言模型在增强ASR输出与独立胜任转录任务中的双重作用,为医疗ASR系统变革带来重大希望,有望在医疗场景中生成更准确可靠的患者记录。