The increasing acceptance of large language models (LLMs) as an alternative to knowledge sources marks a significant paradigm shift across various domains, including time-sensitive fields such as law, healthcare, and finance. To fulfill this expanded role, LLMs must not only be factually accurate but also demonstrate consistency across temporal dimensions, necessitating robust temporal reasoning capabilities. Despite this critical requirement, efforts to ensure temporal consistency in LLMs remain scarce including noticeable absence of endeavors aimed at evaluating or augmenting LLMs across temporal references in time-sensitive inquiries. In this paper, we seek to address this gap by introducing a novel benchmark entitled temporal referential consistency, accompanied by a resource TEMP-ReCon designed to benchmark a wide range of both open-source and closed-source LLMs with various linguistic contexts characterized by differing resource richness (including English, French, and Romanian). The findings emphasis that LLMs do exhibit insufficient temporal referent consistency. To address this, we propose \newmodel, a reasoning path alignment-based model that aims to enhance the temporal referential consistency of LLMs. Our empirical experiments substantiate the efficacy of UnTRaP compared to several baseline models.
翻译:大型语言模型(LLM)日益被接受为知识源的替代方案,这标志着包括法律、医疗和金融等时效敏感领域在内的多个领域发生了重大的范式转变。为胜任这一扩展角色,LLM不仅需要事实准确,还必须在时间维度上保持一致性,这要求其具备稳健的时间推理能力。尽管这一要求至关重要,但确保LLM时间一致性的努力仍然匮乏,包括在时效性查询中评估或增强LLM跨时间参照能力的工作明显缺失。本文旨在填补这一空白,引入一个名为时间指称一致性的新颖基准,并配套一个资源TEMP-ReCon,该资源旨在对广泛的开源和闭源LLM进行基准测试,涵盖具有不同资源丰富度特征(包括英语、法语和罗马尼亚语)的多种语言语境。研究结果强调,LLM确实表现出时间指称一致性不足的问题。为解决此问题,我们提出\newmodel,一种基于推理路径对齐的模型,旨在增强LLM的时间指称一致性。我们的实证实验证实了UnTRaP相较于多个基线模型的有效性。