Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. The exploration of applying large language models (LLM) to triple extraction is still relatively unexplored. In this work, we mainly focus on sentence-level biomedical triple extraction. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To address these challenges, initially, we compare the performance of various large language models. Additionally, we present GIT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types.
翻译:生物医学三元组抽取系统旨在自动提取生物医学实体及其之间的实体关系。目前,将大型语言模型(LLM)应用于三元组抽取的研究仍相对匮乏。在本研究中,我们主要关注句子级别的生物医学三元组抽取。此外,高质量生物医学三元组抽取数据集的缺失阻碍了鲁棒三元组抽取系统的开发进展。为应对这些挑战,我们首先比较了多种大型语言模型的性能。随后,我们提出了GIT——一个由专家标注的生物医学三元组抽取数据集,其涵盖了更广泛的关系类型。