Canonical relation extraction aims to extract relational triples from sentences, where the triple elements (entity pairs and their relationship) are mapped to the knowledge base. Recently, methods based on the encoder-decoder architecture are proposed and achieve promising results. However, these methods cannot well utilize the entity information, which is merely used as augmented training data. Moreover, they are incapable of representing novel entities, since no embeddings have been learned for them. In this paper, we propose a novel framework, Bi-Encoder-Decoder (BED), to solve the above issues. Specifically, to fully utilize entity information, we employ an encoder to encode semantics of this information, leading to high-quality entity representations. For novel entities, given a trained entity encoder, their representations can be easily generated. Experimental results on two datasets show that, our method achieves a significant performance improvement over the previous state-of-the-art and handle novel entities well without retraining.
翻译:规范关系抽取旨在从句子中抽取关系三元组,并将三元组元素(实体对及其关系)映射到知识库中。近年来,基于编码器-解码器架构的方法被提出并取得了令人瞩目的成果。然而,这些方法无法充分利用仅作为增强训练数据的实体信息。此外,由于未学习到新实体的嵌入表示,它们也无法表征新实体。本文提出了一种新颖框架——双编码器-解码器(BED)以解决上述问题。具体而言,为充分利用实体信息,我们采用编码器对其语义进行编码,从而生成高质量的实体表示。对于新实体,在训练好的实体编码器基础上,可轻松生成其表示。在两个数据集上的实验结果表明,我们的方法较之前的最优方法取得了显著的性能提升,且无需重新训练即可有效处理新实体。