The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes

Large Language Models (LLMs) have achieved strong performance across natural language processing tasks, yet reliable reasoning remains an open challenge. Although modern LLMs show progress in structured inference, multi-step problem solving, and contextual understanding, their reasoning behavior is often inconsistent and sensitive to prompting strategies, task design, and model scale. This survey provides a systematic analysis of more than 300 recent papers from arXiv, Semantic Scholar, Google Scholar, Papers with Code, and the ACL Anthology to examine how reasoning capabilities emerge in LLMs and where they fail. We make three main contributions. First, we introduce a structured taxonomy of LLM reasoning research, covering Chain-of-Thought reasoning, multi-hop reasoning, mathematical reasoning, common sense reasoning, visual and temporal reasoning, code and algorithmic reasoning, retrieval-augmented reasoning, tool-augmented and agentic reasoning, and reinforcement learning-based reasoning. Second, we analyze methodological trends across these paradigms, including prompting methods, model architectures, training objectives, reward modeling, and evaluation benchmarks. Third, we synthesize recurring limitations and failure modes, such as reasoning hallucinations, brittle multi-step inference, weak causal abstraction, and poor cross-domain generalization. By organizing a rapidly expanding literature, this survey offers a unified view of the current capabilities and limitations of reasoning in LLMs. We also identify emerging research directions, including meta-reasoning, self-evolving reasoning frameworks, multimodal reasoning, and socially grounded reasoning. Overall, this work aims to serve as a reference for developing more robust, interpretable, and generalizable reasoning systems in future language models.

翻译：大语言模型（LLM）在自然语言处理任务中展现了强劲性能，但可靠的推理能力仍是一个开放挑战。尽管现代LLM在结构化推理、多步问题求解和语境理解方面取得进展，其推理行为常呈现不一致性，且对提示策略、任务设计和模型规模高度敏感。本综述对arXiv、Semantic Scholar、Google Scholar、Papers with Code及ACL Anthology中300余篇近期论文进行系统分析，探究LLM推理能力的涌现机制与失效根源。我们作出三项主要贡献：其一，提出LLM推理研究的结构化分类体系，涵盖思维链推理、多跳推理、数学推理、常识推理、视觉与时间推理、代码与算法推理、检索增强推理、工具增强与代理推理及基于强化学习的推理；其二，分析跨范式的方法论趋势，包括提示方法、模型架构、训练目标、奖励建模与评估基准；其三，归纳反复出现的局限性及失败模式，如推理幻觉、脆弱的多步推理、弱因果抽象及跨领域泛化不足。通过整理快速膨胀的文献，本综述为LLM推理的当前能力与局限提供统一视角。我们同时识别出新兴研究方向，包括元推理、自进化推理框架、多模态推理及社会性推理。总体而言，本工作旨在为未来语言模型中构建更稳健、可解释且可泛化的推理系统提供参考。