While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning (TR), in particular, presents a significant challenge for LLMs due to its reliance on diverse temporal expressions and intricate temporal logic. In this paper, we propose TG-LLM, a novel framework towards language-based TR. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) that facilitates the TR learning. A synthetic dataset (TGQA), which is fully controllable and requires minimal supervision, is constructed for fine-tuning LLMs on this text-to-TG translation task. We confirmed in experiments that the capability of TG translation learned on our dataset can be transferred to other TR tasks and benchmarks. On top of that, we teach LLM to perform deliberate reasoning over the TGs via Chain of Thought (CoT) bootstrapping and graph data augmentation. We observed that those strategies, which maintain a balance between usefulness and diversity, bring more reliable CoTs and final results than the vanilla CoT distillation.
翻译:尽管大型语言模型(LLMs)展现出卓越的推理能力,但它们仍存在缺陷与不准确性。近年研究提出了多种缓解这些局限的方法。尤其是时间推理(Temporal Reasoning, TR)因其依赖多样化的时间表达和复杂的时间逻辑,对LLMs构成了重大挑战。本文提出TG-LLM,一种面向基于语言的时间推理的新型框架。我们摒弃对原始上下文的直接推理,转而采用潜在表示——时间图(Temporal Graph, TG)来促进TR学习。我们构建了合成数据集TGQA,该数据集完全可控且仅需极少监督,用于微调LLMs完成从文本到时间图的翻译任务。实验证实,在该数据集上学到的TG翻译能力可迁移至其他TR任务与基准测试。在此基础上,我们通过思维链(Chain of Thought, CoT)自举和图表数据增强,教导LLM对TG进行审慎推理。我们观察到,这些在实用性与多样性间保持平衡的策略,相比传统CoT蒸馏方法,能带来更可靠的CoT与最终结果。