Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in both assessments and practices. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers, which has limited scalability. In this work, we explore the task of automated distractor and corresponding feedback message generation in math MCQs using large language models. We establish a formulation of these two tasks and propose a simple, in-context learning-based solution. Moreover, we explore using two non-standard metrics to evaluate the quality of the generated distractors and feedback messages. We conduct extensive experiments on these tasks using a real-world MCQ dataset that contains student response information. Our findings suggest that there is a lot of room for improvement in automated distractor and feedback generation. We also outline several directions for future work
翻译:多项选择题(MCQs)在各级教育中普遍存在,因其易于实施、评分便捷,且是评估与练习中可靠的题型。选择题的关键要素在于干扰项——即旨在针对学生特定误解或知识不足而设计的错误选项。迄今为止,设计高质量干扰项对教师和教学内容设计者而言仍是一项劳动密集型工作,这限制了其可扩展性。本研究探索利用大型语言模型自动生成数学选择题的干扰项及相应反馈信息。我们建立了这两项任务的公式化定义,并提出了一种基于上下文学习的简单解决方案。此外,我们尝试使用两种非标准指标来评估所生成干扰项与反馈信息的质量。我们利用包含学生作答信息的真实选择题数据集进行了广泛实验。研究结果表明,自动干扰项与反馈生成仍有较大改进空间,并指出了未来工作的若干方向。