The goal of temporal relation extraction is to infer the temporal relation between two events in the document. Supervised models are dominant in this task. In this work, we investigate ChatGPT's ability on zero-shot temporal relation extraction. We designed three different prompt techniques to break down the task and evaluate ChatGPT. Our experiments show that ChatGPT's performance has a large gap with that of supervised methods and can heavily rely on the design of prompts. We further demonstrate that ChatGPT can infer more small relation classes correctly than supervised methods. The current shortcomings of ChatGPT on temporal relation extraction are also discussed in this paper. We found that ChatGPT cannot keep consistency during temporal inference and it fails in actively long-dependency temporal inference.
翻译:时间关系抽取的目标是推断文档中两个事件之间的时间关系。有监督模型在该任务中占据主导地位。本研究探索了ChatGPT在零样本时间关系抽取上的能力。我们设计了三种不同的提示技巧来分解任务并评估ChatGPT。实验表明,ChatGPT的性能与有监督方法存在较大差距,且高度依赖于提示设计。我们进一步证明,ChatGPT能比有监督方法更准确地推断出更多细粒度关系类别。本文还讨论了ChatGPT在当前时间关系抽取中的不足,发现ChatGPT在时间推断过程中无法保持一致性,且难以有效进行长距离依赖的时间推断。