Event extraction is a fundamental task in natural language processing that involves identifying and extracting information about events mentioned in text. However, it is a challenging task due to the lack of annotated data, which is expensive and time-consuming to obtain. The emergence of large language models (LLMs) such as ChatGPT provides an opportunity to solve language tasks with simple prompts without the need for task-specific datasets and fine-tuning. While ChatGPT has demonstrated impressive results in tasks like machine translation, text summarization, and question answering, it presents challenges when used for complex tasks like event extraction. Unlike other tasks, event extraction requires the model to be provided with a complex set of instructions defining all event types and their schemas. To explore the feasibility of ChatGPT for event extraction and the challenges it poses, we conducted a series of experiments. Our results show that ChatGPT has, on average, only 51.04% of the performance of a task-specific model such as EEQA in long-tail and complex scenarios. Our usability testing experiments indicate that ChatGPT is not robust enough, and continuous refinement of the prompt does not lead to stable performance improvements, which can result in a poor user experience. Besides, ChatGPT is highly sensitive to different prompt styles.
翻译:事件抽取是自然语言处理中的一项基础任务,旨在识别并提取文本中提及的事件信息。然而,由于标注数据获取成本高昂且耗时,该任务面临数据匮乏的挑战。以ChatGPT为代表的大型语言模型的涌现,为通过简单提示解决语言任务提供了可能,无需依赖特定任务的数据集和微调。尽管ChatGPT在机器翻译、文本摘要和问答等任务中展现出显著成效,但在事件抽取这类复杂任务中仍存在挑战。与其他任务不同,事件抽取要求模型配备一整套定义所有事件类型及其模式的复杂指令。为探究ChatGPT用于事件抽取的可行性及其面临的挑战,我们开展了一系列实验。结果表明,在长尾和复杂场景中,ChatGPT的平均性能仅为EEQA等任务专用模型的51.04%。可用性测试实验显示,ChatGPT的鲁棒性不足,持续优化提示词无法带来稳定的性能提升,可能导致用户体验不佳。此外,ChatGPT对不同的提示风格高度敏感。