Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy based on their inherent capabilities. In this work, we propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously, aligning it with their intrinsic aptitude. TATA incorporates base-LLM-aware data selection during supervised fine-tuning (SFT) to tailor training data to the model's unique abilities. This approach equips LLMs to autonomously determine and apply the appropriate reasoning strategy at test time. We evaluate TATA through extensive experiments on six mathematical reasoning benchmarks, using both general-purpose and math-specialized LLMs. Empirical results demonstrate that TATA effectively combines the complementary strengths of CoT and TIR, achieving superior or comparable performance with improved inference efficiency compared to TIR alone. Further analysis underscores the critical role of aptitude-aware data selection in enabling LLMs to make effective and adaptive reasoning decisions and align reasoning strategies with model capabilities.
翻译:现有基于大语言模型(LLM)的数学推理方法主要依赖思维链(CoT)实现泛化能力,或借助工具集成推理(TIR)进行精确计算。尽管已有研究尝试结合这两种方法,但它们主要依赖于后验选择或预定义策略,尚存一个开放性问题:LLM能否根据其内在能力自主调整推理策略?本文提出TATA(Teaching LLMs According to Their Aptitude)框架,该自适应框架使LLM能够自发地个性化其推理策略,使其与模型固有的能力相匹配。TATA在监督微调(SFT)阶段引入基于基础LLM能力感知的数据选择机制,根据模型的独特能力定制训练数据。这一方法使LLM在测试时能够自主决定并应用合适的推理策略。我们在六个数学推理基准上使用通用型和数学专用型LLM进行了广泛实验以评估TATA。实证结果表明,TATA有效结合了CoT与TIR的互补优势,在保持与单独使用TIR相当或更优性能的同时,显著提升了推理效率。进一步分析强调了能力感知数据选择的关键作用:它使LLM能够做出有效且自适应的推理决策,并将推理策略与模型能力精准对齐。