Universal Information Extraction (UIE) is an area of interest due to the challenges posed by varying targets, heterogeneous structures, and demand-specific schemas. However, previous works have only achieved limited success by unifying a few tasks, such as Named Entity Recognition (NER) and Relation Extraction (RE), which fall short of being authentic UIE models particularly when extracting other general schemas such as quadruples and quintuples. Additionally, these models used an implicit structural schema instructor, which could lead to incorrect links between types, hindering the model's generalization and performance in low-resource scenarios. In this paper, we redefine the authentic UIE with a formal formulation that encompasses almost all extraction schemas. To the best of our knowledge, we are the first to introduce UIE for any kind of schemas. In addition, we propose RexUIE, which is a Recursive Method with Explicit Schema Instructor for UIE. To avoid interference between different types, we reset the position ids and attention mask matrices. RexUIE shows strong performance under both full-shot and few-shot settings and achieves State-of-the-Art results on the tasks of extracting complex schemas.
翻译:通用信息抽取(UIE)因目标多变、结构异构及模式需求差异而成为研究热点。然而,现有工作仅通过统一少量任务(如命名实体识别NER和关系抽取RE)取得有限成功,尤其在处理四元组、五元组等通用模式时难以实现真正的UIE模型。此外,现有模型采用隐式结构模式指导,可能导致类型间错误关联,阻碍模型在低资源场景下的泛化能力与性能表现。本文通过形式化定义重新界定了真正意义上的UIE,该定义涵盖几乎所有抽取模式。据我们所知,这是首次提出面向任意模式的UIE方法。同时,我们提出RexUIE——一种基于显式模式指导的递归UIE方法。为避免不同类型间的相互干扰,我们重置了位置标识符与注意力掩码矩阵。实验表明,RexUIE在全样本与少样本场景下均表现优异,并在复杂模式抽取任务中达到当前最优水平。