Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-student progressive learning framework that emulates the teacher-student education process to improve the efficacy of model fine-tuning. The framework operates on an interactive \textit{basic-generalized-harder} loop. The teacher agent provides tailored feedback on the student's answers, and systematically organizes the education process. This process unfolds by teaching the student basic examples, reinforcing understanding through generalized questions, and then enhancing learning by posing questions with progressively enhanced complexity. With the teacher's guidance, the student learns to iteratively refine its answer with feedback, and forms a robust and comprehensive understanding of the posed questions. The systematic procedural data, which reflects the progressive learning process of humans, is then utilized for model training. Taking math reasoning as a testbed, experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain (+17.01\% on GSM8K and +9.98\% on MATH). In addition, we find that training with curriculum learning further improves learning robustness.
翻译:尽管大语言模型(LLMs)在多项任务中展现了卓越能力,但其学习效率仍落后于人类。这种差距通常源于人类固有的学习能力:从基础示例中学习、逐步泛化以应对复杂问题,并通过持续反馈精进技能。受此启发,本文提出YODA——一种新颖的师生渐进式学习框架,通过模拟师生教学过程提升模型微调效率。该框架基于交互式“基础-泛化-进阶”循环运行:教师代理针对学生答案提供定制化反馈,并系统规划教学过程——先教授基础示例,通过泛化问题强化理解,再逐步提升问题难度以增强学习效果。在教师引导下,学生学会根据反馈迭代优化答案,形成对问题稳健而全面的理解。反映人类渐进学习过程的系统性程序数据随后被用于模型训练。以数学推理为试验场景,实验表明:使用YODA生成的数据训练LLaMA2,相较于标准微调(SFT)取得显著性能提升(GSM8K数据集提升17.01%,MATH数据集提升9.98%)。此外,我们发现采用课程学习训练能进一步提升模型学习的鲁棒性。