This paper presents a methodology for enhancing relation extraction from biomedical texts, focusing specifically on chemical-gene interactions. Leveraging the BioBERT model and a multi-layer fully connected network architecture, our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy. Through extensive experimentation, we demonstrate significant performance improvements, particularly in CPR groups shared between the datasets. The findings underscore the importance of dataset merging in augmenting sample counts and improving model accuracy. Moreover, the study highlights the potential of automated information extraction in biomedical research and clinical practice.
翻译:本文提出了一种增强生物医学文本关系抽取的方法,特别聚焦于化学物质-基因相互作用。该方法利用BioBERT模型和多层全连接网络架构,通过一种新颖的融合策略整合了ChemProt与DrugProt数据集。经过大量实验验证,我们展示了显著的性能提升,尤其在两个数据集共享的CPR类别中表现突出。研究结果强调了数据集融合在增加样本数量与提升模型准确性方面的重要性。此外,本研究凸显了自动化信息抽取在生物医学研究与临床实践中的潜力。