Information extraction techniques, including named entity recognition (NER) and relation extraction (RE), are crucial in many domains to support making sense of vast amounts of unstructured text data by identifying and connecting relevant information. Such techniques can assist researchers in extracting valuable insights. In this paper, we introduce the Entity-aware Masking for Biomedical Relation Extraction (EMBRE) method for biomedical relation extraction, as applied in the context of the BioRED challenge Task 1, in which human-annotated entities are provided as input. Specifically, we integrate entity knowledge into a deep neural network by pretraining the backbone model with an entity masking objective. We randomly mask named entities for each instance and let the model identify the masked entity along with its type. In this way, the model is capable of learning more specific knowledge and more robust representations. Then, we utilize the pre-trained model as our backbone to encode language representations and feed these representations into two multilayer perceptron (MLPs) to predict the logits for relation and novelty, respectively. The experimental results demonstrate that our proposed method can improve the performances of entity pair, relation and novelty extraction over our baseline.
翻译:摘要:信息抽取技术(包括命名实体识别和关系抽取)在多个领域中至关重要,可通过识别并连接相关信息来支持对海量非结构化文本数据的理解。这类技术能帮助研究人员提取有价值的洞见。本文针对生物医学关系抽取任务,提出了一种实体感知掩码方法(EMBRE),该方法应用于BioRED挑战赛任务一(该任务中提供了人工标注的实体作为输入)。具体而言,我们通过实体掩码目标对骨干模型进行预训练,从而将实体知识融入深度神经网络。我们对每个实例中的命名实体进行随机掩码,让模型识别被掩盖的实体及其类型。通过这种方式,模型能够学习更具体的知识和更鲁棒的表示。随后,我们利用预训练模型作为骨干网络编码语言表示,并将这些表示输入至两个多层感知机(MLP),分别预测关系和新颖性的逻辑值。实验结果表明,与基线方法相比,我们提出的方法能够提升实体对、关系及新颖性抽取的性能。