Event extraction is a fundamental task in natural language processing that involves identifying and extracting information about events mentioned in text. However, it is a challenging task due to the lack of annotated data, which is expensive and time-consuming to obtain. The emergence of large language models (LLMs) such as ChatGPT provides an opportunity to solve language tasks with simple prompts without the need for task-specific datasets and fine-tuning. While ChatGPT has demonstrated impressive results in tasks like machine translation, text summarization, and question answering, it presents challenges when used for complex tasks like event extraction. Unlike other tasks, event extraction requires the model to be provided with a complex set of instructions defining all event types and their schemas. To explore the feasibility of ChatGPT for event extraction and the challenges it poses, we conducted a series of experiments. Our results show that ChatGPT has, on average, only 51.04% of the performance of a task-specific model such as EEQA in long-tail and complex scenarios. Our usability testing experiments indicate that ChatGPT is not robust enough, and continuous refinement of the prompt does not lead to stable performance improvements, which can result in a poor user experience. Besides, ChatGPT is highly sensitive to different prompt styles.
翻译:事件抽取是自然语言处理中的一项基础任务,旨在识别并提取文本中提及的事件信息。然而,由于标注数据获取成本高昂且耗时,该任务面临标注数据匮乏的挑战。以ChatGPT为代表的大语言模型的出现,为通过简单提示解决语言任务提供了可能,无需专门的数据集和微调。尽管ChatGPT在机器翻译、文本摘要和问答等任务中表现出色,但在事件抽取这类复杂任务中仍面临挑战。与其他任务不同,事件抽取要求模型配备定义所有事件类型及其模式的一套复杂指令集。为探究ChatGPT在事件抽取中的可行性及其面临的挑战,我们开展了一系列实验。结果表明,在长尾和复杂场景下,ChatGPT的平均性能仅为EEQA等任务专用模型的51.04%。可用性测试实验表明,ChatGPT的鲁棒性不足,持续优化提示词无法带来稳定的性能提升,这可能导致用户体验不佳。此外,ChatGPT对不同提示风格高度敏感。