Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challenge that LLMs face when it comes to temporal reasoning. Our preliminary experiments show that generating intermediate reasoning steps does not always boost the performance of complex temporal question-answering tasks. Therefore, we propose a novel framework that combines the extraction capability of LLMs and the logical reasoning capability of a Python solver to tackle this issue. Extensive experiments and analysis demonstrate the effectiveness of our framework in handling intricate time-bound reasoning tasks.
翻译:大语言模型(LLMs)在自然语言处理(NLP)领域取得了显著进展,并被广泛应用于各类任务中。近期研究(如思维链CoT)表明,中间推理步骤可以提升LLMs在复杂推理任务(如数学问题与符号问答)上的表现。然而,我们注意到LLMs在时间推理方面仍面临挑战。初步实验显示,生成中间推理步骤并不总能提升复杂时间问答任务的性能。为此,我们提出了一种新颖框架,通过结合LLMs的提取能力与Python求解器的逻辑推理能力来应对这一问题。大量实验与分析证明了该框架在处理复杂时间约束推理任务中的有效性。