Feedback-Generation for Programming Exercises With GPT-4

Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.

翻译：自大型语言模型及相关应用广泛普及以来，多项研究探讨了其在辅助教育工作者和支持高等教育学生方面的潜力。诸如Codex、GPT-3.5和GPT-4等大型语言模型在大型编程课程中展现出令人鼓舞的结果——若能及时且大规模地提供反馈与提示，学生将从中受益。本文探究了GPT-4 Turbo针对包含编程任务说明与学生提交内容作为输入的提示所生成输出的质量。我们从一门编程入门课程中选取了两个作业任务，要求GPT-4为55份随机选取的真实学生编程作业生成反馈，并从正确性、个性化、故障定位及材料中识别的其他特征维度对输出进行定性分析。相较于先前对GPT-3.5的研究与分析，GPT-4 Turbo展现出显著改进：例如输出结构更清晰、一致性更强；能准确识别学生程序输出中的无效大小写；部分反馈还包含学生程序的输出结果。但同时发现不一致的反馈现象，例如指出提交内容正确但需修正某个错误。本研究深化了我们对大型语言模型潜力与局限性的认识，并探讨了如何将其整合至电子评估系统、教学场景及指导使用基于GPT-4应用的学生群体。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日