Universal Information Extraction (UIE) is an area of interest due to the challenges posed by varying targets, heterogeneous structures, and demand-specific schemas. However, previous works have only achieved limited success by unifying a few tasks, such as Named Entity Recognition (NER) and Relation Extraction (RE), which fall short of being authentic UIE models particularly when extracting other general schemas such as quadruples and quintuples. Additionally, these models used an implicit structural schema instructor, which could lead to incorrect links between types, hindering the model's generalization and performance in low-resource scenarios. In this paper, we redefine the authentic UIE with a formal formulation that encompasses almost all extraction schemas. To the best of our knowledge, we are the first to introduce UIE for any kind of schemas. In addition, we propose RexUIE, which is a Recursive Method with Explicit Schema Instructor for UIE. To avoid interference between different types, we reset the position ids and attention mask matrices. RexUIE shows strong performance under both full-shot and few-shot settings and achieves State-of-the-Art results on the tasks of extracting complex schemas.
翻译:通用信息抽取(UIE)因面临目标各异、结构异构以及模式需求特定等挑战而成为研究热点。然而,以往研究仅通过统一少量任务(如命名实体识别(NER)和关系抽取(RE))取得了有限成功,这些方法在抽取四元组、五元组等其他通用模式时难以成为真正的UIE模型。此外,这些模型采用隐式结构模式指导,可能导致类型间错误关联,制约模型在低资源场景下的泛化能力与性能。本文通过形式化定义重新界定了真正的UIE,使其涵盖几乎所有的抽取模式。据我们所知,这是首次提出适用于任意类型模式的UIE方法。同时,我们提出了RexUIE——一种具有显式模式指导的递归UIE方法。为避免不同类型间的干扰,我们重置了位置编码与注意力掩码矩阵。实验表明,RexUIE在全样本和少样本场景下均展现出强大性能,并在复杂模式抽取任务中取得了最先进的结果。