Facts extraction is pivotal for constructing knowledge graphs. Recently, the increasing demand for temporal facts in downstream tasks has led to the emergence of the task of temporal fact extraction. In this paper, we specifically address the extraction of temporal facts from natural language text. Previous studies fail to handle the challenge of establishing time-to-fact correspondences in complex sentences. To overcome this hurdle, we propose a timeline-based sentence decomposition strategy using large language models (LLMs) with in-context learning, ensuring a fine-grained understanding of the timeline associated with various facts. In addition, we evaluate the performance of LLMs for direct temporal fact extraction and get unsatisfactory results. To this end, we introduce TSDRE, a method that incorporates the decomposition capabilities of LLMs into the traditional fine-tuning of smaller pre-trained language models (PLMs). To support the evaluation, we construct ComplexTRED, a complex temporal fact extraction dataset. Our experiments show that TSDRE achieves state-of-the-art results on both HyperRED-Temporal and ComplexTRED datasets.
翻译:事实抽取是构建知识图谱的关键。近年来,下游任务对时序事实的需求日益增长,催生了时序事实抽取这一任务。本文专门研究从自然语言文本中抽取时序事实的问题。以往研究难以应对复杂句子中时间与事实对应关系的建立。为此,我们提出一种基于时间线结合上下文学习利用大语言模型的句子分解策略,确保对与各类事实相关的时间线进行细粒度理解。此外,我们评估了大语言模型直接用于时序事实抽取的性能,结果并不理想。基于此,我们引入TSDRE方法,该方法将大语言模型的分解能力融入传统小规模预训练语言模型的微调过程。为支持评估,我们构建了复杂时序事实抽取数据集ComplexTRED。实验表明,TSDRE在HyperRED-Temporal和ComplexTRED两个数据集上均取得了最优结果。