Information Extraction (IE), encompassing Named Entity Recognition (NER), Named Entity Linking (NEL), and Relation Extraction (RE), is critical for transforming the rapidly growing volume of scientific publications into structured, actionable knowledge. This need is especially evident in fast-evolving biomedical fields such as the gut-brain axis, where research investigates complex interactions between the gut microbiota and brain-related disorders. Existing biomedical IE benchmarks, however, are often narrow in scope and rely heavily on distantly supervised or automatically generated annotations, limiting their utility for advancing robust IE methods. We introduce GutBrainIE, a benchmark based on more than 1,600 PubMed abstracts, manually annotated by biomedical and terminological experts with fine-grained entities, concept-level links, and relations. While grounded in the gut-brain axis, the benchmark's rich schema, multiple tasks, and combination of highly curated and weakly supervised data make it broadly applicable to the development and evaluation of biomedical IE systems across domains.
翻译:信息抽取(IE)涵盖命名实体识别(NER)、命名实体链接(NEL)与关系抽取(RE),对于将快速增长的科学文献转化为结构化、可操作的知识至关重要。这一需求在快速发展的生物医学领域(如肠脑轴)尤为明显,该领域研究肠道微生物群与脑相关疾病之间复杂的相互作用。然而,现有的生物医学IE基准通常范围狭窄,且严重依赖远程监督或自动生成的标注,限制了其在推进稳健IE方法方面的效用。我们提出了GutBrainIE基准,该基准基于1,600余篇PubMed摘要,由生物医学与术语学专家人工标注了细粒度实体、概念级链接及关系。尽管以肠脑轴为基础,该基准凭借其丰富的架构、多任务设计以及高度精选数据与弱监督数据的结合,使其广泛适用于跨领域的生物医学IE系统开发与评估。