Comparison of pipeline, sequence-to-sequence, and GPT models for end-to-end relation extraction: experiments with the rare disease use-case

End-to-end relation extraction (E2ERE) is an important and realistic application of natural language processing (NLP) in biomedicine. In this paper, we aim to compare three prevailing paradigms for E2ERE using a complex dataset focused on rare diseases involving discontinuous and nested entities. We use the RareDis information extraction dataset to evaluate three competing approaches (for E2ERE): NER $\rightarrow$ RE pipelines, joint sequence to sequence models, and generative pre-trained transformer (GPT) models. We use comparable state-of-the-art models and best practices for each of these approaches and conduct error analyses to assess their failure modes. Our findings reveal that pipeline models are still the best, while sequence-to-sequence models are not far behind; GPT models with eight times as many parameters are worse than even sequence-to-sequence models and lose to pipeline models by over 10 F1 points. Partial matches and discontinuous entities caused many NER errors contributing to lower overall E2E performances. We also verify these findings on a second E2ERE dataset for chemical-protein interactions. Although generative LM-based methods are more suitable for zero-shot settings, when training data is available, our results show that it is better to work with more conventional models trained and tailored for E2ERE. More innovative methods are needed to marry the best of the both worlds from smaller encoder-decoder pipeline models and the larger GPT models to improve E2ERE. As of now, we see that well designed pipeline models offer substantial performance gains at a lower cost and carbon footprint for E2ERE. Our contribution is also the first to conduct E2ERE for the RareDis dataset.

翻译：端到端关系抽取（E2ERE）是自然语言处理（NLP）在生物医学领域的重要且实际的应用。本文旨在使用一个涉及不连续实体和嵌套实体的复杂罕见疾病数据集，比较三种主流的E2ERE范式。我们利用RareDis信息抽取数据集评估了三种竞争性方法（用于E2ERE）：命名实体识别→关系抽取流水线、联合序列到序列模型以及生成式预训练Transformer（GPT）模型。我们对每种方法采用了可比较的最先进模型及最佳实践，并通过错误分析评估其失败模式。研究发现，流水线模型仍表现最佳，序列到序列模型紧随其后；具有八倍参数量的GPT模型甚至不如序列到序列模型，且比流水线模型低超过10个F1分值。部分匹配和不连续实体导致大量命名实体识别错误，进而降低了整体端到端性能。我们还在第二个化学-蛋白质相互作用E2ERE数据集上验证了这些发现。尽管基于生成式语言模型的方法更适合零样本场景，但当训练数据可用时，我们的结果表明，使用针对E2ERE定制训练的更为传统的模型效果更优。需要更具创新性的方法，以融合小型编码器-解码器流水线模型和大型GPT模型的各自优势，从而改进E2ERE。目前，我们观察到，设计良好的流水线模型在更低成本和碳足迹下为E2ERE提供了显著的性能提升。我们的贡献还在于首次对RareDis数据集进行了端到端关系抽取研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日