In this paper, we propose a novel method for joint entity and relation extraction from unstructured text by framing it as a conditional sequence generation problem. In contrast to conventional generative information extraction models that are left-to-right token-level generators, our approach is \textit{span-based}. It generates a linearized graph where nodes represent text spans and edges represent relation triplets. Our method employs a transformer encoder-decoder architecture with pointing mechanism on a dynamic vocabulary of spans and relation types. Our model can capture the structural characteristics and boundaries of entities and relations through span representations while simultaneously grounding the generated output in the original text thanks to the pointing mechanism. Evaluation on benchmark datasets validates the effectiveness of our approach, demonstrating competitive results. Code is available at https://github.com/urchade/ATG.
翻译:本文提出了一种新颖的联合实体与关系抽取方法,通过将任务重构为条件序列生成问题,从非结构化文本中实现实体与关系的联合抽取。与传统的从左到右逐token生成的信息抽取模型不同,我们的方法是基于跨度的。该方法生成一个线性化图,其中节点表示文本跨度,边表示关系三元组。模型采用变压器编码器-解码器架构,在动态跨度与关系类型词汇表上结合了指向机制。通过跨度表示,模型能够捕获实体与关系的结构特征及边界,同时借助指向机制将生成输出同步锚定于原始文本。基准数据集上的评估验证了我们方法的有效性,展示了具有竞争力的结果。代码可在https://github.com/urchade/ATG获取。