Finding and fixing errors is a time-consuming task not only for novice programmers but also for expert programmers. Prior work has identified frequent error patterns among various levels of programmers. However, the differences in the tendencies between novices and experts have yet to be revealed. From the knowledge of the frequent errors in each level of programmers, instructors will be able to provide helpful advice for each level of learners. In this paper, we propose a rule-based error classification tool to classify errors in code pairs consisting of wrong and correct programs. We classify errors for 95,631 code pairs and identify 3.47 errors on average, which are submitted by various levels of programmers on an online judge system. The classified errors are used to analyze the differences in frequent errors between novice and expert programmers. The analyzed results show that, as for the same introductory problems, errors made by novices are due to the lack of knowledge in programming, and the mistakes are considered an essential part of the learning process. On the other hand, errors made by experts are due to misunderstandings caused by the carelessness of reading problems or the challenges of solving problems differently than usual. The proposed tool can be used to create error-labeled datasets and for further code-related educational research.
翻译:发现并修复错误不仅对于新手程序员,对于专家程序员来说也是一项耗时任务。先前的研究已识别出不同水平程序员中的常见错误模式。然而,新手与专家之间错误倾向的差异尚未揭示。通过了解各水平程序员的频繁错误知识,教师能够为不同层次的学习者提供有益的指导。本文提出了一种基于规则的错误分类工具,用于对由错误程序与正确程序组成的代码对中的错误进行分类。我们对在线评测系统中不同水平程序员提交的95,631个代码对进行了错误分类,平均每个代码对识别出3.47个错误。利用分类后的错误,我们分析了新手程序员与专家程序员在频繁错误上的差异。分析结果表明,对于相同的入门问题而言,新手所犯的错误源于编程知识的缺乏,这些错误被视为学习过程中的必要组成部分;而专家所犯的错误则源于对问题的误读或采用与往常不同的解题方式所带来的挑战。本文提出的工具可用于创建带有错误标注的数据集,并进一步促进代码相关的教育研究。