Code-switching (CS), which is when Vietnamese speech uses English words like drug names or procedures, is a common phenomenon in Vietnamese medical communication. This creates challenges for Automatic Speech Recognition (ASR) systems, especially in low-resource languages like Vietnamese. Current most ASR systems struggle to recognize correctly English medical terms within Vietnamese sentences, and no benchmark addresses this challenge. In this paper, we construct a 34-hour \textbf{Vi}etnamese \textbf{Med}ical \textbf{C}ode-\textbf{S}witching \textbf{S}peech dataset (ViMedCSS) containing 16,576 utterances. Each utterance includes at least one English medical term drawn from a curated bilingual lexicon covering five medical topics. Using this dataset, we evaluate several state-of-the-art ASR models and examine different specific fine-tuning strategies for improving medical term recognition to investigate the best approach to solve in the dataset. Experimental results show that Vietnamese-optimized models perform better on general segments, while multilingual pretraining helps capture English insertions. The combination of both approaches yields the best balance between overall and code-switched accuracy. This work provides the first benchmark for Vietnamese medical code-switching and offers insights into effective domain adaptation for low-resource, multilingual ASR systems.
翻译:代码转换(CS)——即越南语语音中夹杂使用英语词汇(如药物名称或医疗程序)的现象——在越南医疗交流中十分普遍。这给自动语音识别(ASR)系统带来了挑战,尤其是在越南语这类低资源语言中。当前大多数ASR系统难以准确识别越南语句子中的英语医学术语,且尚无基准测试专门应对这一挑战。本文构建了一个34小时的**越南语医疗代码转换语音数据集(ViMedCSS)**,包含16,576条语音片段。每条片段至少包含一个从精心构建的双语词典中选取的英语医学术语,该词典涵盖五个医疗主题。基于此数据集,我们评估了多种前沿ASR模型,并研究了不同的针对性微调策略以提升医学术语识别能力,从而探索解决该数据集问题的最佳方法。实验结果表明,针对越南语优化的模型在通用片段上表现更佳,而多语言预训练则有助于捕捉英语插入成分。结合两种方法可在整体准确率与代码转换准确率之间取得最佳平衡。此项工作首次为越南语医疗代码转换提供了基准测试,并为低资源多语言ASR系统的有效领域适应提供了见解。