In this paper, we propose a novel method for joint entity and relation extraction from unstructured text by framing it as a conditional sequence generation problem. In contrast to conventional generative information extraction models that are left-to-right token-level generators, our approach is \textit{span-based}. It generates a linearized graph where nodes represent text spans and edges represent relation triplets. Our method employs a transformer encoder-decoder architecture with pointing mechanism on a dynamic vocabulary of spans and relation types. Our model can capture the structural characteristics and boundaries of entities and relations through span representations while simultaneously grounding the generated output in the original text thanks to the pointing mechanism. Evaluation on benchmark datasets validates the effectiveness of our approach, demonstrating competitive results. Code is available at https://github.com/urchade/ATG.
翻译:本文通过将非结构化文本中的联合实体与关系抽取问题建模为条件序列生成任务,提出了一种新颖方法。与传统的从左至右逐词生成的生成式信息抽取模型不同,本方法是基于文本片段的(span-based)。该方法生成一个线性化图,其中节点表示文本片段,边表示关系三元组。模型采用基于动态片段与关系类型词汇表的Transformer编码器-解码器架构,并融入指向机制(pointing mechanism)。通过片段表示,模型能够捕获实体与关系的结构特征及边界;同时借助指向机制,将生成输出与原始文本建立对应。在基准数据集上的评估验证了本方法的有效性,展示了具有竞争力的结果。代码已开源至 https://github.com/urchade/ATG。