A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education

Jacob Doughty,Zipiao Wan,Anishka Bompelli,Jubahed Qayum,Taozhi Wang,Juran Zhang,Yujia Zheng,Aidan Doyle,Pragnya Sridhar,Arav Agarwal,Christopher Bogart,Eric Keylor,Can Kultur,Jaromir Savelka,Majd Sakr

There is a constant need for educators to develop and maintain effective up-to-date assessments. While there is a growing body of research in computing education on utilizing large language models (LLMs) in generation and engagement with coding exercises, the use of LLMs for generating programming MCQs has not been extensively explored. We analyzed the capability of GPT-4 to produce multiple-choice questions (MCQs) aligned with specific learning objectives (LOs) from Python programming classes in higher education. Specifically, we developed an LLM-powered (GPT-4) system for generation of MCQs from high-level course context and module-level LOs. We evaluated 651 LLM-generated and 449 human-crafted MCQs aligned to 246 LOs from 6 Python courses. We found that GPT-4 was capable of producing MCQs with clear language, a single correct choice, and high-quality distractors. We also observed that the generated MCQs appeared to be well-aligned with the LOs. Our findings can be leveraged by educators wishing to take advantage of the state-of-the-art generative models to support MCQ authoring efforts.

翻译：教育工作者持续需要开发并维护有效的、与时俱进的评估工具。尽管计算机教育领域围绕利用大语言模型（LLMs）生成和处理编程练习的研究日益增多，但将LLMs用于生成编程类多项选择题（MCQs）的工作尚未得到广泛探索。我们分析了GPT-4生成与高等教育Python编程课程中特定学习目标（LOs）对齐的多项选择题的能力。具体而言，我们开发了一个基于LLM（GPT-4）的系统，能从高层课程背景及模块级学习目标中自动生成MCQs。我们针对6门Python课程中246个学习目标，评估了651道LLM生成和449道人工设计的MCQs。研究发现，GPT-4能够生成语言清晰、单选题正确答案明确且干扰项质量高的MCQs。同时，生成的MCQs与学习目标表现出良好的一致性。我们的发现可为希望借助当前最先进生成模型来支持MCQ编写工作的教育工作者提供参考。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日