Information extraction tasks such as event extraction require an in-depth understanding of the output structure and sub-task dependencies. They heavily rely on task-specific training data in the form of (passage, target structure) pairs to obtain reasonable performance. However, obtaining such data through human annotation is costly, leading to a pressing need for low-resource information extraction approaches that require minimal human labeling for real-world applications. Fine-tuning supervised models with synthesized training data would be a generalizable method, but the existing data generation methods either still rely on large-scale ground-truth data or cannot be applied to complicated IE tasks due to their poor performance. To address these challenges, we propose STAR, a data generation method that leverages Large Language Models (LLMs) to synthesize data instances given limited seed demonstrations, thereby boosting low-resource information extraction performance. Our approach involves generating target structures (Y) followed by generating passages (X), all accomplished with the aid of LLMs. We design fine-grained step-by-step instructions to obtain the initial data instances. We further reduce errors and improve data quality through self-reflection error identification and self-refinement with iterative revision. Our experiments show that the data generated by STAR significantly improves the performance of low-resource event extraction and relation extraction tasks, even surpassing the effectiveness of human-curated data. Human assessment of the data quality shows STAR-generated data exhibits higher passage quality and better align with the task definitions compared with the human-curated data.
翻译:信息抽取任务(如事件抽取)需要对输出结构和子任务依赖关系有深入理解。这些任务严重依赖(段落,目标结构)配对形式的任务特定训练数据以获得合理性能。然而,通过人工标注获取此类数据成本高昂,导致实际应用中迫切需要仅需最少人工标注的低资源信息抽取方法。使用合成训练数据微调监督模型是一种可泛化的方法,但现有数据生成方法要么仍依赖大规模真实数据,要么因性能不佳而无法应用于复杂信息抽取任务。为应对这些挑战,我们提出STAR,一种利用大语言模型在给定少量种子示例的情况下合成数据实例的数据生成方法,从而提升低资源信息抽取性能。我们的方法包括先生成目标结构(Y)再生成段落(X),全部借助大语言模型完成。我们设计了细粒度逐步指令以获取初始数据实例,并通过自我反思错误识别和迭代修订的自我精炼进一步减少错误并提高数据质量。实验表明,STAR生成的数据显著提升了低资源事件抽取和关系抽取任务的性能,甚至超越了人工整理数据的有效性。数据质量的人工评估显示,与人工整理数据相比,STAR生成的数据具有更高的段落质量,且更符合任务定义。