In knowledge graph construction, a challenging issue is how to extract complex (e.g., overlapping) entities and relationships from a small amount of unstructured historical data. The traditional pipeline methods are to divide the extraction into two separate subtasks, which misses the potential interaction between the two subtasks and may lead to error propagation. In this work, we propose an effective cascade dual-decoder method to extract overlapping relational triples, which includes a text-specific relation decoder and a relation-corresponded entity decoder. Our approach is straightforward and it includes a text-specific relation decoder and a relation-corresponded entity decoder. The text-specific relation decoder detects relations from a sentence at the text level. That is, it does this according to the semantic information of the whole sentence. For each extracted relation, which is with trainable embedding, the relation-corresponded entity decoder detects the corresponding head and tail entities using a span-based tagging scheme. In this way, the overlapping triple problem can be tackled naturally. We conducted experiments on a real-world open-pit mine dataset and two public datasets to verify the method's generalizability. The experimental results demonstrate the effectiveness and competitiveness of our proposed method and achieve better F1 scores under strict evaluation metrics. Our implementation is available at https://github.com/prastunlp/DualDec.
翻译:在知识图谱构建中,一个具有挑战性的问题是如何从少量非结构化历史数据中提取复杂(例如重叠)的实体与关系。传统的流水线方法将提取过程划分为两个独立的子任务,这忽略了两个子任务间潜在的交互作用,并可能导致错误传播。本文提出了一种有效的级联双解码器方法来提取重叠关系三元组,该方法包含一个文本特定关系解码器和一个关系对应实体解码器。我们的方法结构清晰,包含文本特定关系解码器与关系对应实体解码器。文本特定关系解码器在文本层面从句子中检测关系,即依据整个句子的语义信息进行关系识别。对于每个通过可训练嵌入表示的关系,关系对应实体解码器采用基于跨度的标注方案检测相应的头实体和尾实体。通过这种方式,重叠三元组问题得以自然解决。我们在真实世界露天矿数据集和两个公共数据集上进行了实验,以验证方法的泛化能力。实验结果表明,我们所提方法具有有效性和竞争力,并在严格评估指标下取得了更高的F1分数。代码实现已发布于 https://github.com/prastunlp/DualDec。