Are Large language models (LLMs) temporally grounded? Since LLMs cannot perceive and interact with the environment, it is impossible to answer this question directly. Instead, we provide LLMs with textual narratives and probe them with respect to their common-sense knowledge of the structure and duration of events, their ability to order events along a timeline, and self-consistency within their temporal model (e.g., temporal relations such as after and before are mutually exclusive for any pair of events). We evaluate state-of-the-art LLMs (such as LLaMA 2 and GPT-4) on three tasks reflecting these abilities. Generally, we find that LLMs lag significantly behind both human performance as well as small-scale, specialised LMs. In-context learning, instruction tuning, and chain-of-thought prompting reduce this gap only to a limited degree. Crucially, LLMs struggle the most with self-consistency, displaying incoherent behaviour in at least 27.23% of their predictions. Contrary to expectations, we also find that scaling the model size does not guarantee positive gains in performance. To explain these results, we study the sources from which LLMs may gather temporal information: we find that sentence ordering in unlabelled texts, available during pre-training, is only weakly correlated with event ordering. Moreover, public instruction tuning mixtures contain few temporal tasks. Hence, we conclude that current LLMs lack a consistent temporal model of textual narratives. Code, datasets, and LLM outputs are available at https://github.com/yfqiu-nlp/temporal-llms.
翻译:大型语言模型(LLMs)是否具备时间基础?由于LLMs无法感知和与环境交互,直接回答这一问题并不可能。为此,我们为LLMs提供文本叙事,并探究其对事件结构与时长的常识知识、沿时间轴排列事件的能力,以及其时间模型内部的自洽性(例如,对任意两个事件,“之后”与“之前”等时间关系是互斥的)。我们评估了先进LLMs(如LLaMA 2和GPT-4)在反映上述能力的三个任务上的表现。总体而言,我们发现LLMs的表现显著落后于人类水平以及小规模专用语言模型。上下文学习、指令微调与思维链提示仅能在有限程度上缩小这一差距。关键在于,LLMs在处理自洽性时困难最大,至少27.23%的预测表现出不一致行为。与预期相反,我们还发现增大模型规模并不能保证性能提升。为解释这些结果,我们研究了LLMs可能获取时间信息的来源:预训练所用的未标注文本中的句子顺序与事件顺序仅呈弱相关。此外,公开指令微调数据集中时间相关任务很少。因此,我们得出结论:当前LLMs缺乏对文本叙事一致的时间模型。代码、数据集及LLM输出已发布在https://github.com/yfqiu-nlp/temporal-llms。