Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. The exploration of applying large language models (LLM) to triple extraction is still relatively unexplored. In this work, we mainly focus on sentence-level biomedical triple extraction. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To address these challenges, initially, we compare the performance of various large language models. Additionally, we present GIT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types.
翻译:生物医学三元组抽取系统旨在自动抽取生物医学实体及实体间关系。当前,将大型语言模型应用于三元组抽取的研究仍相对薄弱。本研究主要聚焦于句子级生物医学三元组抽取。此外,缺乏高质量生物医学三元组抽取数据集制约了稳健三元组抽取系统的开发进程。为解决上述挑战,我们首先对比了多种大型语言模型的性能表现,同时提出了GIT——一个覆盖更广泛关系类型的专家标注生物医学三元组抽取数据集。