We propose a new paradigm for universal information extraction (IE) that is compatible with any schema format and applicable to a list of IE tasks, such as named entity recognition, relation extraction, event extraction and sentiment analysis. Our approach converts the text-based IE tasks as the token-pair problem, which uniformly disassembles all extraction targets into joint span detection, classification and association problems with a unified extractive framework, namely UniEX. UniEX can synchronously encode schema-based prompt and textual information, and collaboratively learn the generalized knowledge from pre-defined information using the auto-encoder language models. We develop a traffine attention mechanism to integrate heterogeneous factors including tasks, labels and inside tokens, and obtain the extraction target via a scoring matrix. Experiment results show that UniEX can outperform generative universal IE models in terms of performance and inference-speed on $14$ benchmarks IE datasets with the supervised setting. The state-of-the-art performance in low-resource scenarios also verifies the transferability and effectiveness of UniEX.
翻译:我们提出了一种通用信息抽取新范式,该范式兼容任意模式格式,适用于命名实体识别、关系抽取、事件抽取和情感分析等一系列信息抽取任务。该方法将基于文本的信息抽取任务转化为词对问题,通过统一抽取框架(即UniEX)将所有抽取目标分解为跨度检测、分类和关联联合问题。UniEX可同步编码基于模式的提示与文本信息,并利用自编码语言模型协作学习预定义信息中的泛化知识。我们设计了一种跨注意力机制来整合任务、标签和内部词元等异质因素,通过评分矩阵获得抽取目标。实验结果表明,在14个监督设置下的基准信息抽取数据集上,UniEX在性能和推理速度方面均优于生成式通用信息抽取模型。在低资源场景下的最优性能进一步验证了UniEX的可迁移性与有效性。