STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models - 专知论文

会员服务 ·

0

事件抽取 · Boosting（一种模型训练加速方式） · CASES · 语言模型化 · Performance ·

2023 年 5 月 24 日

STAR: Boosting Low-Resource Event Extraction by Structure-to-Text Data Generation with Large Language Models

翻译：STAR: 利用大语言模型的结构到文本数据生成提升低资源事件抽取

Mingyu Derek Ma,Xiaoxuan Wang,Po-Nien Kung,P. Jeffrey Brantingham,Nanyun Peng,Wei Wang

Structure prediction tasks such as event extraction require an in-depth understanding of the output structure and sub-task dependencies, thus they still heavily rely on task-specific training data to obtain reasonable performance. Due to the high cost of human annotation, low-resource event extraction, which requires minimal human cost, is urgently needed in real-world information extraction applications. We propose to synthesize data instances given limited seed demonstrations to boost low-resource event extraction performance. We propose STAR, a structure-to-text data generation method that first generates complicated event structures (Y) and then generates input passages (X), all with Large Language Models. We design fine-grained step-by-step instructions and the error cases and quality issues identified through self-reflection can be self-refined. Our experiments indicate that data generated by STAR can significantly improve the low-resource event extraction performance and they are even more effective than human-curated data points in some cases.

翻译：结构化预测任务（如事件抽取）需要深入理解输出结构及子任务间的依赖关系，因此仍高度依赖任务特定训练数据才能获得合理性能。由于人工标注成本高昂，低资源事件抽取（要求最小化人工成本）在真实世界信息抽取应用中具有迫切需求。我们提出通过有限种子样例合成数据实例来提升低资源事件抽取性能。我们提出STAR——一种结构到文本数据生成方法，该方法首先生成复杂事件结构（Y），再生成输入文本段落（X），全程使用大语言模型。我们设计了细粒度的逐步指令，并通过自反思机制自动修正识别出的错误案例与质量问题。实验表明，STAR生成的数据能显著提升低资源事件抽取性能，某些情况下甚至比人工标注数据更具优势。

0

相关内容

事件抽取

事件抽取指的是从非结构化文本中抽取事件信息，并将其以结构化形式呈现出来的任务。例如从“毛泽东1893 年出生于湖南湘潭”这句话中抽取事件{类型：出生，人物：毛泽东，时间：1893 年，出生地：湖南湘潭}。事件抽取任务通常包含事件类型识别和事件元素填充两个子任务。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

lncRNA DATOC1影响microRNA成熟促进卵巢癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

肥胖相关Hepatokine LECT2在肝脏中的调控及机制

国家自然科学基金

1+阅读 · 2015年12月31日

基于连续波四相位法的飞行时间(TOF)三维成像图像传感器研究

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

无人机自主导航中LiDAR点云与图像特征提取与配准方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于ChIA-PET系统解析猪骨骼肌卫星细胞3D基因组成肌分化调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

电离层热层对太阳辐射变化响应的特征时间

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间平台的空间目标成像及识别方法研究

国家自然科学基金

2+阅读 · 2011年12月31日

养殖海域浮游动物群落结构演变对环境变化的生物地球化学响应

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

PatternGPT :A Pattern-Driven Framework for Large Language Model Text Generation

Arxiv

0+阅读 · 2023年7月13日

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

Arxiv

0+阅读 · 2023年7月12日

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Arxiv

0+阅读 · 2023年7月11日

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Arxiv

0+阅读 · 2023年7月11日

Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs

Arxiv

0+阅读 · 2023年7月10日

A Survey on Evaluation of Large Language Models

Arxiv

0+阅读 · 2023年7月9日

Large Language Models for Supply Chain Optimization

Arxiv

0+阅读 · 2023年7月8日

LENS: A Learnable Evaluation Metric for Text Simplification

Arxiv

0+阅读 · 2023年7月7日

Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models

Arxiv

0+阅读 · 2023年7月7日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

语言模型化

最新内容

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

0+阅读 · 今天15:20

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

0+阅读 · 今天15:18

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

8+阅读 · 今天5:53

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

4+阅读 · 今天5:45

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

2+阅读 · 今天5:23

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

2+阅读 · 今天5:11

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

6+阅读 · 今天5:04

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

4+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

8+阅读 · 7月26日

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

《面向可扩展高韧性无人机集群网络的速度感知分层通信框架》

专知会员服务

8+阅读 · 7月26日

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

《面向概率推理的可定制战术引擎及其在军事任务规划中的应用》

专知会员服务

10+阅读 · 7月26日

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

《先进防空系统选型战略框架：基于巴基斯坦的实证启示》

专知会员服务

8+阅读 · 7月26日

《反无人机交战场景下的战斗归零研究》

《反无人机交战场景下的战斗归零研究》

专知会员服务

7+阅读 · 7月26日

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

霍尔木兹与不对称作战时代：水雷、无人系统与海军力量的重新定义

专知会员服务

4+阅读 · 7月26日

博士论文 | 用代码结构感知方法推进代码大模型

博士论文 | 用代码结构感知方法推进代码大模型

专知会员服务

5+阅读 · 7月25日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

美空军新型反无人机部队初探

博士论文 | 面向大模型推理的内存高效算法

《无人系统互操作性导论——无人系统联合架构（JAUS）》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

PatternGPT :A Pattern-Driven Framework for Large Language Model Text Generation

Arxiv

0+阅读 · 2023年7月13日

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

Arxiv

0+阅读 · 2023年7月12日

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Arxiv

0+阅读 · 2023年7月11日

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Arxiv

0+阅读 · 2023年7月11日

Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs

Arxiv

0+阅读 · 2023年7月10日

A Survey on Evaluation of Large Language Models

Arxiv

0+阅读 · 2023年7月9日

Large Language Models for Supply Chain Optimization

Arxiv

0+阅读 · 2023年7月8日

LENS: A Learnable Evaluation Metric for Text Simplification

Arxiv

0+阅读 · 2023年7月7日

Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models

Arxiv

0+阅读 · 2023年7月7日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

lncRNA DATOC1影响microRNA成熟促进卵巢癌转移的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

肥胖相关Hepatokine LECT2在肝脏中的调控及机制

国家自然科学基金

1+阅读 · 2015年12月31日

基于连续波四相位法的飞行时间(TOF)三维成像图像传感器研究

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

无人机自主导航中LiDAR点云与图像特征提取与配准方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于ChIA-PET系统解析猪骨骼肌卫星细胞3D基因组成肌分化调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

电离层热层对太阳辐射变化响应的特征时间

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间平台的空间目标成像及识别方法研究

国家自然科学基金

2+阅读 · 2011年12月31日

养殖海域浮游动物群落结构演变对环境变化的生物地球化学响应

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员