Open Information Extraction (OIE) aims to extract objective structured knowledge from natural texts, which has attracted growing attention to build dedicated models with human experience. As the large language models (LLMs) have exhibited remarkable in-context learning capabilities, a question arises as to whether the task of OIE can be effectively tackled with this paradigm? In this paper, we explore solving the OIE problem by constructing an appropriate reasoning environment for LLMs. Specifically, we first propose a method to effectively estimate the discrepancy of syntactic distribution between a LLM and test samples, which can serve as correlation evidence for preparing positive demonstrations. Upon the evidence, we introduce a simple yet effective mechanism to establish the reasoning environment for LLMs on specific tasks. Without bells and whistles, experimental results on the standard CaRB benchmark demonstrate that our $6$-shot approach outperforms state-of-the-art supervised method, achieving an $55.3$ $F_1$ score. Further experiments on TACRED and ACE05 show that our method can naturally generalize to other information extraction tasks, resulting in improvements of $5.7$ and $6.8$ $F_1$ scores, respectively.
翻译:开放信息抽取(OIE)旨在从自然文本中提取客观结构化知识,这促使人们日益关注基于人类经验构建专用模型。随着大语言模型展现出卓越的上下文学习能力,一个关键问题浮现:能否通过这一范式有效解决OIE任务?本文通过为大语言模型构建合适的推理环境来探索OIE问题的解决方案。具体而言,我们首先提出一种方法,能够有效估计大语言模型与测试样本之间的句法分布差异,这可为准备正向示例提供相关性依据。基于此依据,我们引入一种简单而有效的机制,为特定任务下的大语言模型建立推理环境。无需繁复设计,在标准CaRB基准上的实验结果表明,我们的6-shot方法超越了当前最优的有监督方法,取得了55.3的F1分数。进一步在TACRED和ACE05上的实验显示,我们的方法可自然泛化至其他信息抽取任务,分别带来5.7和6.8的F1分数提升。