Large language models (LLMs) are increasingly being explored in higher education, yet their effectiveness as teaching agents remains underexamined. In this paper, we present the development of GuideLM, a fine-tuned LLM designed for programming education. GuideLM has been integrated into the Debugging C Compiler (DCC), an educational C compiler that leverages LLMs to generate pedagogically sound error explanations. Previously, DCC relied on off-the-shelf OpenAI models, which, while accurate, often over-assisted students by directly providing solutions despite contrary prompting. To address this, we employed supervised fine-tuning (SFT) on a dataset of 528 student-question/teacher-answer pairs, creating two models: GuideLM and GuideLM-mini, fine-tuned on ChatGPT-4o and 4o-mini, respectively. We conducted an expert analysis of 400 responses per model, comparing their pedagogical effectiveness against base OpenAI models. Our evaluation, grounded in constructivism and cognitive load theory, assessed factors such as conceptual scaffolding, clarity, and Socratic guidance. Results indicate that GuideLM and GuideLM-mini improve pedagogical performance, with an 8% increase in Socratic guidance and a 58% improvement in economy of words compared to GPT-4o. However, this refinement comes at the cost of a slight reduction in general accuracy. While further work is needed, our findings suggest that fine-tuning LLMs with targeted datasets is a promising approach for developing models better suited to educational contexts.
翻译:大型语言模型(LLM)在高等教育中的应用日益广泛,但其作为教学代理的有效性仍未得到充分检验。本文介绍了GuideLM的开发过程,这是一种专为编程教育设计的微调大型语言模型。GuideLM已集成至Debugging C Compiler(DCC)——一款利用LLM生成符合教学原理的错误解释的教育型C语言编译器。此前,DCC依赖于现成的OpenAI模型,这些模型虽然准确,但经常过度辅助学生,即使提示要求相反,仍直接提供解决方案。为解决此问题,我们基于528组学生问题/教师答案配对数据集进行了监督微调(SFT),创建了两个模型:GuideLM和GuideLM-mini,分别基于ChatGPT-4o和4o-mini进行微调。我们对每个模型的400条回复进行了专家分析,将其教学效果与基础OpenAI模型进行比较。我们的评估以建构主义与认知负荷理论为基础,评估了概念支架、清晰度及苏格拉底式引导等因素。结果表明,GuideLM和GuideLM-mini提升了教学性能:与GPT-4o相比,苏格拉底式引导提高了8%,语言精炼度提升了58%。然而,这种优化是以轻微降低总体准确性为代价的。尽管仍需进一步研究,我们的发现表明,利用定向数据集对LLM进行微调,是开发更适用于教育场景模型的有效途径。