Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

This paper introduces the design and development of a framework that integrates a large language model (LLM) with a retrieval-augmented generation (RAG) approach leveraging both a knowledge graph and user interaction history. The framework is incorporated into a previously developed adaptive learning support system to assess learners' code, generate formative feedback, and recommend exercises. Moerover, this study examines learner preferences across three instructional modes; adaptive, Generative AI (GenAI), and hybrid GenAI-adaptive. An experimental study was conducted to compare the learning performance and perception of the learners, and the effectiveness of these three modes using four key log features derived from 4956 code submissions across all experimental groups. The analysis results show that learners receiving feedback from GenAI modes had significantly more correct code and fewer code submissions missing essential programming logic than those receiving feedback from adaptive mode. In particular, the hybrid GenAI-adaptive mode achieved the highest number of correct submissions and the fewest incorrect or incomplete attempts, outperforming both the adaptive-only and GenAI-only modes. Questionnaire responses further indicated that GenAI-generated feedback was widely perceived as helpful, while all modes were rated positively for ease of use and usefulness. These results suggest that the hybrid GenAI-adaptive mode outperforms the other two modes across all measured log features.

翻译：本文介绍了一个框架的设计与开发，该框架将大语言模型与基于检索增强生成的方法相结合，利用知识图谱和用户交互历史。该框架被集成到先前开发的自适应学习支持系统中，用于评估学习者的代码、生成形成性反馈并推荐练习。此外，本研究考察了学习者对三种教学模式的偏好：自适应模式、生成式人工智能模式以及混合生成式人工智能-自适应模式。通过一项实验研究，比较了学习者的学习表现和感知，以及这三种模式的效果，利用了从所有实验组共4956次代码提交中提取的四个关键日志特征。分析结果表明，与接受自适应模式反馈的学习者相比，接受生成式人工智能模式反馈的学习者拥有显著更多的正确代码，且缺少基本编程逻辑的代码提交更少。特别地，混合生成式人工智能-自适应模式达到了最高的正确提交数量和最少的不正确或不完整尝试，优于纯自适应模式和纯生成式人工智能模式。问卷反馈进一步表明，生成式人工智能生成的反馈被广泛认为是有帮助的，而所有模式在易用性和有用性方面均获得了积极评价。这些结果表明，混合生成式人工智能-自适应模式在所有测量的日志特征上均优于其他两种模式。