Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events, and use it to reason over problems requiring such knowledge. This trait is essential in temporal natural language processing tasks, with possible applications such as timeline summarization, temporal question answering, and temporal natural language inference. Recent research on the performance of large language models suggests that, although they are adept at generating syntactically correct sentences and solving classification tasks, they often take shortcuts in their reasoning and fall prey to simple linguistic traps. This article provides an overview of research in the domain of temporal commonsense reasoning, particularly focusing on enhancing language model performance through a variety of augmentations and their evaluation across a growing number of datasets. However, these augmented models still struggle to approach human performance on reasoning tasks over temporal common sense properties, such as the typical occurrence times, orderings, or durations of events. We further emphasize the need for careful interpretation of research to guard against overpromising evaluation results in light of the shallow reasoning present in transformers. This can be achieved by appropriately preparing datasets and suitable evaluation metrics.
翻译:时间常识推理指理解短语、动作及事件典型时间语境,并运用此类知识推理问题的能力。这一特性在时间自然语言处理任务中至关重要,可能应用于时间线摘要、时间问答及时间自然语言推理等场景。近期关于大语言模型性能的研究表明,尽管它们擅长生成语法正确的句子并解决分类任务,但其推理常走捷径并易陷入简单语言陷阱。本文综述了时间常识推理领域的研究进展,重点关注通过多种增强方法提升语言模型性能及其在日益增多的数据集上的评估。然而,这些增强模型在时间常识属性(如事件典型发生时间、顺序或持续时间)的推理任务中仍难以达到人类水平。我们进一步强调,鉴于Transformer存在浅层推理问题,需审慎解读研究以避免夸大评估结果。通过合理准备数据集及采用恰当评估指标可实现这一目标。