Translating electronic health record (EHR) narratives from English to Spanish is a clinically important yet challenging task due to the lack of a parallel-aligned corpus and the abundant unknown words contained. To address such challenges, we propose \textbf{NOOV} (for No OOV), a new neural machine translation (NMT) system that requires little in-domain parallel-aligned corpus for training. NOOV integrates a bilingual lexicon automatically learned from parallel-aligned corpora and a phrase look-up table extracted from a large biomedical knowledge resource, to alleviate both the unknown word problem and the word-repeat challenge in NMT, enhancing better phrase generation of NMT systems. Evaluation shows that NOOV is able to generate better translation of EHR with improvement in both accuracy and fluency.
翻译:将电子健康记录(EHR)叙述从英语翻译成西班牙语是一项临床意义重大但极具挑战性的任务,这主要是由于缺乏平行对齐语料库以及其中包含大量未知词汇。为应对这些挑战,我们提出了 \textbf{NOOV}(意为无未登录词),一种新型神经机器翻译(NMT)系统,其训练仅需少量领域内平行对齐语料。NOOV 集成了一个从平行对齐语料库中自动学习的双语词典,以及一个从大型生物医学知识资源中提取的短语查找表,旨在缓解 NMT 中的未登录词问题和词汇重复挑战,从而提升 NMT 系统生成短语的能力。评估结果表明,NOOV 能够生成质量更高的 EHR 翻译,在准确性和流畅性方面均有提升。