We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4, a large language model. This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises. A hint can be requested each time a student fails a test case. Our evaluation addresses three Research Questions: RQ1: Do the hints help students improve their code? RQ2: How effectively do the hints capture problems in student code? RQ3: Are the issues that students resolve the same as the issues addressed in the hints? To address these research questions quantitatively, we identified a set of fine-grained knowledge components and determined which ones apply to each exercise, incorrect solution, and generated hint. Comparing data from two large CS1 offerings, we found that access to the hints helps students to address problems with their code more quickly, that hints are able to consistently capture the most pressing errors in students' code, and that hints that address a few issues at once rather than a single bug are more likely to lead to direct student progress.
翻译:本研究评估了一个由GPT-4大型语言模型驱动的CS1编程作业自动提示生成系统。该系统针对学生在简短编程练习中的错误解决方案,提供关于如何改进的自然语言指导。每当学生未通过测试用例时,即可请求获得提示。我们的评估围绕三个研究问题展开:RQ1:提示是否有助于学生改进代码?RQ2:提示捕捉学生代码问题的效果如何?RQ3:学生实际解决的问题与提示所针对的问题是否一致?为定量研究这些问题,我们定义了一套细粒度的知识组件,并确定每个练习、错误解决方案及生成提示所对应的组件。通过比较两个大型CS1课程的数据,我们发现:获取提示能帮助学生更快地解决代码问题;提示能持续捕捉学生代码中最紧迫的错误;同时处理多个问题(而非单一缺陷)的提示更有可能直接促进学生取得进展。