In the fast-changing realm of information, the capacity to construct coherent timelines from extensive event-related content has become increasingly significant and challenging. The complexity arises in aggregating related documents to build a meaningful event graph around a central topic. This paper proposes CHRONOS - Causal Headline Retrieval for Open-domain News Timeline SummarizatiOn via Iterative Self-Questioning, which offers a fresh perspective on the integration of Large Language Models (LLMs) to tackle the task of Timeline Summarization (TLS). By iteratively reflecting on how events are linked and posing new questions regarding a specific news topic to gather information online or from an offline knowledge base, LLMs produce and refresh chronological summaries based on documents retrieved in each round. Furthermore, we curate Open-TLS, a novel dataset of timelines on recent news topics authored by professional journalists to evaluate open-domain TLS where information overload makes it impossible to find comprehensive relevant documents from the web. Our experiments indicate that CHRONOS is not only adept at open-domain timeline summarization, but it also rivals the performance of existing state-of-the-art systems designed for closed-domain applications, where a related news corpus is provided for summarization.
翻译:在瞬息万变的信息领域中,从海量事件相关的内容中构建连贯的时间线已变得日益重要且充满挑战。其复杂性在于如何聚合相关文档,围绕核心主题构建有意义的事件图谱。本文提出CHRONOS——一种基于迭代自问的开放域新闻时间线因果标题检索与摘要生成框架,该框架为整合大语言模型(LLMs)以处理时间线摘要任务提供了全新视角。通过迭代反思事件间的关联,并针对特定新闻主题提出新问题以从在线资源或离线知识库中收集信息,LLMs能够基于每轮检索到的文档生成并更新时序摘要。此外,我们构建了Open-TLS数据集,该新颖数据集包含由专业记者撰写的近期新闻主题时间线,用于评估开放域时间线摘要任务——在此类任务中,信息过载使得从网络获取完整相关文档变得极为困难。实验表明,CHRONOS不仅擅长开放域时间线摘要,其性能更可与现有面向封闭域应用(即提供相关新闻语料进行摘要)的最先进系统相媲美。