Open information extraction (OpenIE) aims to extract the schema-free triplets in the form of (\emph{subject}, \emph{predicate}, \emph{object}) from a given sentence. Compared with general information extraction (IE), OpenIE poses more challenges for the IE models, {especially when multiple complicated triplets exist in a sentence. To extract these complicated triplets more effectively, in this paper we propose a novel generative OpenIE model, namely \emph{DualOIE}, which achieves a dual task at the same time as extracting some triplets from the sentence, i.e., converting the triplets into the sentence.} Such dual task encourages the model to correctly recognize the structure of the given sentence and thus is helpful to extract all potential triplets from the sentence. Specifically, DualOIE extracts the triplets in two steps: 1) first extracting a sequence of all potential predicates, 2) then using the predicate sequence as a prompt to induce the generation of triplets. Our experiments on two benchmarks and our dataset constructed from Meituan demonstrate that DualOIE achieves the best performance among the state-of-the-art baselines. Furthermore, the online A/B test on Meituan platform shows that 0.93\% improvement of QV-CTR and 0.56\% improvement of UV-CTR have been obtained when the triplets extracted by DualOIE were leveraged in Meituan's search system.
翻译:开放信息抽取(OpenIE)旨在从给定句子中提取无模式限制的三元组(主语,谓词,宾语)。与通用信息抽取(IE)相比,OpenIE对IE模型提出了更大挑战,尤其是当句子中存在多个复杂三元组时。为更有效地提取这些复杂三元组,本文提出一种新型生成式OpenIE模型——DualOIE,该模型在从句子中提取部分三元组的同时实现一个对偶任务,即将三元组转换为句子。这种对偶任务鼓励模型正确识别给定句子的结构,从而有助于从句子中提取所有潜在三元组。具体而言,DualOIE通过两步提取三元组:1)首先提取所有潜在谓词的序列,2)随后将该谓词序列作为提示(prompt)诱导三元组的生成。我们在两个基准数据集及美团构建的数据集上的实验表明,DualOIE在现有最优基线方法中取得了最佳性能。此外,美团平台的在线A/B测试显示,当在美团搜索系统中利用DualOIE提取的三元组时,QV-CTR获得0.93%的提升,UV-CTR获得0.56%的提升。