Llama-Polya：基于波利亚问题解决法的大型语言模型指令微调 (Llama-Polya: Instruction Tuning for Large Language Model based on Polya's Problem-solving)

This paper introduces Llama-Polya, an instruction-tuned large language model that integrates Polya's four-step problem-solving framework into its dialogue structure to support mathematical reasoning. Mathematical problem-solving is central to students' success in mathematics education, yet many learners struggle to plan, justify, and verify their solutions. Although large language models (LLMs) show promise as intelligent tutors, they often lack structured pedagogical alignment grounded in established learning theories. To address this gap, we operationalize Polya's problem-solving framework within an instruction-tuned LLM to promote metacognitive engagement and examine the effects of pedagogy-aligned fine-tuning compared to domain-only and general-purpose instruction tuning. Built on the Llama-3.1-8B architecture, Llama-Polya was fine-tuned on synthetic math problem-solving data derived from GSM8K, structured according to Polya's four stages. We developed and evaluated multiple variants-general-purpose instruct, math-domain metamath, pedagogy-aligned polya-v2, and sequential metamath+polya-v2-using both quantitative accuracy metrics and qualitative pedagogical assessments. Results indicate that models tuned with Polya's framework and domain-specific data produced more balanced reasoning-stage distributions and fewer premature answers. Expert evaluators also observed improved pedagogical coherence and metacognitive prompting, although limitations in personalization and mathematical rigor remained. These findings suggest that pedagogy-grounded instruction tuning can enhance educational alignment and reasoning transparency in LLM-based tutoring systems.

翻译：本文介绍了Llama-Polya，一种指令微调的大型语言模型，它将波利亚的四步问题解决框架集成到其对话结构中，以支持数学推理。数学问题解决是学生在数学教育中取得成功的核心，然而许多学习者在规划、论证和验证其解决方案方面存在困难。尽管大型语言模型作为智能导师展现出潜力，但它们往往缺乏基于既定学习理论的结构化教学对齐。为弥补这一差距，我们在指令微调的LLM中实现了波利亚问题解决框架，以促进元认知参与，并比较了教学对齐微调与纯领域微调及通用指令微调的效果。基于Llama-3.1-8B架构构建的Llama-Polya，在源自GSM8K并根据波利亚四阶段构建的合成数学问题解决数据上进行了微调。我们开发并评估了多个变体——通用指令模型、数学领域MetaMath模型、教学对齐Polya-v2模型以及顺序组合MetaMath+Polya-v2模型——同时采用定量准确度指标和定性教学评估方法。结果表明，使用波利亚框架和领域特定数据微调的模型产生了更均衡的推理阶段分布和更少的过早答案。专家评估者也观察到教学连贯性和元认知提示的改进，尽管在个性化和数学严谨性方面仍存在局限。这些发现表明，基于教学理论的指令微调可以增强基于LLM的辅导系统的教育对齐性和推理透明度。