Reasoning is a fundamental capability of AI agents. Recently, large language models (LLMs) have shown remarkable abilities to perform reasoning tasks. However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations. An outstanding limitation is length generalization, meaning that when trained on reasoning problems of smaller lengths or sizes, the resulting models struggle with problems of larger sizes or lengths. This potentially indicates some theoretical limitations of generalization in learning reasoning skills. These evaluations and their observations motivated us to perform a theoretical study of the length generalization problem. This work focuses on reasoning tasks that can be formulated as Markov dynamic processes (MDPs) and/or directed acyclic graphs (DAGs). It identifies and proves conditions that decide whether the length generalization problem can be solved or not for a reasoning task in a particular representation. Experiments are also conducted to verify the theoretical results.
翻译:推理是人工智能体的基本能力。近年来,大型语言模型(LLMs)在执行推理任务方面展现出显著能力。然而,大量针对LLMs推理能力的评估也揭示了一些局限性。一个突出的局限是长度泛化,即当模型在较小长度或规模的推理问题上训练时,其在更大规模或长度的问题上表现不佳。这潜在地表明,学习推理技能中的泛化存在某些理论限制。这些评估及其观察结果促使我们对长度泛化问题展开理论研究。本文聚焦于可表述为马尔可夫动态过程(MDPs)和/或有向无环图(DAGs)的推理任务,识别并证明了决定特定表示下推理任务能否解决长度泛化问题的条件。通过实验验证了理论结果。