xMEN: A Modular Toolkit for Cross-Lingual Medical Entity Normalization

Objective: To improve performance of medical entity normalization across many languages, especially when fewer language resources are available compared to English. Materials and Methods: We introduce xMEN, a modular system for cross-lingual medical entity normalization, which performs well in both low- and high-resource scenarios. When synonyms in the target language are scarce for a given terminology, we leverage English aliases via cross-lingual candidate generation. For candidate ranking, we incorporate a trainable cross-encoder model if annotations for the target task are available. We also evaluate cross-encoders trained in a weakly supervised manner based on machine-translated datasets from a high resource domain. Our system is publicly available as an extensible Python toolkit. Results: xMEN improves the state-of-the-art performance across a wide range of multilingual benchmark datasets. Weakly supervised cross-encoders are effective when no training data is available for the target task. Through the compatibility of xMEN with the BigBIO framework, it can be easily used with existing and prospective datasets. Discussion: Our experiments show the importance of balancing the output of general-purpose candidate generators with subsequent trainable re-rankers, which we achieve through a rank regularization term in the loss function of the cross-encoder. However, error analysis reveals that multi-word expressions and other complex entities are still challenging. Conclusion: xMEN exhibits strong performance for medical entity normalization in multiple languages, even when no labeled data and few terminology aliases for the target language are available. Its configuration system and evaluation modules enable reproducible benchmarks. Models and code are available online at the following URL: https://github.com/hpi-dhc/xmen

翻译：目的：提升多种语言（尤其是资源较英语匮乏的语言）的医学实体归一化性能。材料与方法：我们提出xMEN——一个跨语言医学实体归一化的模块化系统，在低资源和高资源场景下均表现优异。当目标语言中特定术语的同义词稀缺时，我们通过跨语言候选生成利用英语别名；在候选排名阶段，若目标任务存在标注数据，则整合可训练的跨编码器模型。我们还评估了基于高资源领域机器翻译数据集以弱监督方式训练的跨编码器。本系统以可扩展的Python工具包形式公开发布。结果：xMEN在多个多语言基准数据集上提升了当前最优性能。当目标任务缺乏训练数据时，弱监督跨编码器表现有效。通过xMEN与BigBIO框架的兼容性，可便捷地用于现有及未来数据集。讨论：实验表明，需平衡通用候选生成器的输出与后续可训练重排序器的重要性——我们通过跨编码器损失函数中的排序正则化项实现这一平衡。但错误分析显示，多词表达及其他复杂实体仍具挑战性。结论：xMEN在无标注数据和目标语言术语别名稀缺的情况下，仍能在多种语言中实现强大的医学实体归一化性能。其配置系统和评估模块支持可重复的基准测试。模型与代码可通过以下URL在线获取：https://github.com/hpi-dhc/xmen