Conceptual Mutation Testing for Student Programming Misconceptions

Context: Students often misunderstand programming problem descriptions. This can lead them to solve the wrong problem, which creates frustration, obstructs learning, and imperils grades. Researchers have found that students can be made to better understand the problem by writing examples before they start programming. These examples are checked against correct and wrong implementations -- analogous to mutation testing -- provided by course staff. Doing so results in better student understanding of the problem as well as better test suites to accompany the program, both of which are desirable educational outcomes. Inquiry: Producing mutant implementations requires care. If there are too many, or they are too obscure, students will end up spending a lot of time on an unproductive task and also become frustrated. Instead, we want a small number of mutants that each correspond to common problem misconceptions. This paper presents a workflow with partial automation to produce mutants of this form which, notably, are not those produced by mutation-testing tools. Approach: We comb through student tests that fail a correct implementation. The student misconceptions are embedded in these failures. We then use methods to semantically cluster these failures. These clusters are then translated into conceptual mutants. These can then be run against student data to determine whether we they are better than prior methods. Some of these processes also enjoy automation. Knowledge: We find that student misconceptions illustrated by failing tests can be operationalized by the above process. The resulting mutants do much better at identifying student misconceptions. Grounding: Our findings are grounded in a manual analysis of student examples and a quantitative evaluation of both our clustering techniques and our process for making conceptual mutants. The clustering evaluation compares against a ground truth using standard cluster-correspondence measures, while the mutant evaluation examines how conceptual mutants perform against student data. Importance: Our work contributes a workflow, with some automation, to reduce the cost and increase the effectiveness of generating conceptually interesting mutants. Such mutants can both improve learning outcomes and reduce student frustration, leading to better educational outcomes. In the process, we also identify a variation of mutation testing not commonly discussed in the software literature.

翻译：背景：学生常常误解编程问题描述，这可能导致他们解决错误的问题，从而引发挫败感、阻碍学习进程并危及成绩。研究发现，通过引导学生在编程前撰写示例，可以使其更好地理解问题。这些示例需要与课程人员提供的正确及错误实现（类似于变异测试）进行比对。这种做法不仅能帮助学生更准确理解问题，还能生成更优质的测试用例集，两者均是理想的教育成果。探究：制作变异实现需要谨慎处理。若变异数量过多或设计得过于晦涩，学生将会耗费大量时间在低效任务上并产生挫败感。反之，我们期望少量但能对应常见认知偏差的变异。本文提出一种包含部分自动化的工作流，用于生成此类变异——值得注意的是，这些变异不同于常规变异测试工具生成的产物。方法：我们系统梳理学生测试中未通过正确实现的失败案例，其中蕴含着学生的认知偏差。随后采用语义聚类方法对这些失败案例进行归类，并将聚类结果转化为概念变异。这些概念变异可应用于学生数据进行验证，以评估其相较于传统方法的优势。该流程中的部分环节已实现自动化。知识：研究表明，通过失败测试所体现的学生认知偏差可通过上述流程转化为可操作的形式。由此产生的概念变异在识别学生认知偏差方面表现更优。依据：本研究基于对学生示例的人工分析，以及针对聚类技术与概念变异生成流程的定量评估。聚类评估采用标准聚类对应度量指标与基准真相进行比对，变异评估则考察概念变异在学生数据上的表现效果。重要性：本工作贡献了一套带有部分自动化功能的工作流，能够降低生成具有教学意义变异的成本并提升其有效性。这类变异既能改善学习成效，又能减少学生挫败感，最终带来更佳的教育成果。在此过程中，我们还识别了一种软件文献中鲜少讨论的变异测试变体。