Structure prediction tasks such as event extraction require an in-depth understanding of the output structure and sub-task dependencies, thus they still heavily rely on task-specific training data to obtain reasonable performance. Due to the high cost of human annotation, low-resource event extraction, which requires minimal human cost, is urgently needed in real-world information extraction applications. We propose to synthesize data instances given limited seed demonstrations to boost low-resource event extraction performance. We propose STAR, a structure-to-text data generation method that first generates complicated event structures (Y) and then generates input passages (X), all with Large Language Models. We design fine-grained step-by-step instructions and the error cases and quality issues identified through self-reflection can be self-refined. Our experiments indicate that data generated by STAR can significantly improve the low-resource event extraction performance and they are even more effective than human-curated data points in some cases.
翻译:结构化预测任务(如事件抽取)需要深入理解输出结构及子任务间的依赖关系,因此仍高度依赖任务特定训练数据才能获得合理性能。由于人工标注成本高昂,低资源事件抽取(要求最小化人工成本)在真实世界信息抽取应用中具有迫切需求。我们提出通过有限种子样例合成数据实例来提升低资源事件抽取性能。我们提出STAR——一种结构到文本数据生成方法,该方法首先生成复杂事件结构(Y),再生成输入文本段落(X),全程使用大语言模型。我们设计了细粒度的逐步指令,并通过自反思机制自动修正识别出的错误案例与质量问题。实验表明,STAR生成的数据能显著提升低资源事件抽取性能,某些情况下甚至比人工标注数据更具优势。