Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 distinct types. Secondly, we present baseline results using various state-of-the-art pre-trained models: encoder-only and sequence-to-sequence. We found that pre-trained multilingual models XLM-R outperformed all monolingual models on both reference text and ASR output. Also in general, encoders perform better than sequence-to-sequence models for the NER task. By simply translating, the transcript is applicable not just to Vietnamese but to other languages as well. All code, data and models are made publicly available here: https://github.com/leduckhai/MultiMed
翻译:口语命名实体识别(NER)旨在从语音中提取命名实体并将其分类为人名、地点、机构等类型。在本工作中,我们提出了VietMed-NER——首个医学领域的口语NER数据集。据我们所知,就实体类型数量而言,我们的真实世界数据集是目前世界上最大的口语NER数据集,包含18种不同的类型。其次,我们使用多种最先进的预训练模型(仅编码器模型和序列到序列模型)给出了基线结果。我们发现,无论是在参考文本还是自动语音识别输出上,预训练的多语言模型XLM-R均优于所有单语模型。此外,总体而言,对于NER任务,编码器模型的表现优于序列到序列模型。通过简单的翻译,该转录文本不仅适用于越南语,也适用于其他语言。所有代码、数据和模型均已在此公开:https://github.com/leduckhai/MultiMed