Automatic speech recognition for French medical conversations remains challenging, with word error rates often exceeding 30% in spontaneous clinical speech. This study proposes a multi-pass LLM post-processing architecture alternating between Speaker Recognition and Word Recognition passes to improve transcription accuracy and speaker attribution. Ablation studies on two French clinical datasets (suicide prevention telephone counseling and preoperative awake neurosurgery consultations) investigate four design choices: model selection, prompting strategy, pass ordering, and iteration depth. Using Qwen3-Next-80B, Wilcoxon signed-rank tests confirm significant WDER reductions on suicide prevention conversations (p<0.05, n=18), while maintaining stability on awake neurosurgery consultations (n=10), with zero output failures and acceptable computational cost (RTF 0.32), suggesting feasibility for offline clinical deployment, pending validation on larger corpora.
翻译:法语医疗对话的自动语音识别仍面临挑战,自发性临床语音的词错误率常超过30%。本研究提出一种多轮大语言模型后处理架构,交替执行说话人识别与词语识别流程,以提升转写准确率与说话人归属准确性。在法语临床数据集(自杀预防电话咨询与术前清醒神经外科问诊)上的消融实验探究了四项设计选择:模型选取、提示策略、流程顺序及迭代深度。基于Qwen3-Next-80B模型,Wilcoxon符号秩检验证实自杀预防对话的词错误率显著降低(p<0.05, n=18),同时在清醒神经外科问诊中保持稳定性(n=10),零输出失败案例且计算成本可接受(实时因子0.32),表明该方案具备离线临床部署可行性,但需在大规模语料库上进一步验证。