BugSpotter：代码调试练习的自动生成工具 (BugSpotter: Automated Generation of Code Debugging Exercises)

Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.

翻译：调试是学习编程过程中的一项核心技能，然而在入门课程中，其教学与重视程度往往差异显著。在代码生成大语言模型（LLM）时代，学生推理代码和识别错误的能力变得日益重要。然而，学生常常采用试错法来解决程序缺陷，而没有充分理解其根本原因。培养识别并推测缺陷根源的能力至关重要，但通过传统教学方式进行有效训练可能非常耗时。本文介绍BugSpotter，这是一种创新工具，它利用LLM根据问题描述生成包含缺陷的代码，并通过测试套件验证所合成的缺陷。学生通过与BugSpotter交互来设计失败的测试用例，其中缺陷代码的输出与问题规范定义的预期结果不符。这不仅为学生提供了提升调试技能的机会，也让他们得以练习阅读和理解问题规范。我们在一个大型课堂环境中部署了BugSpotter，并将其生成的调试练习与教师针对相同问题手工设计的练习进行了比较。我们发现，BugSpotter生成的LLM练习在难度上有所变化，并且与问题规范高度匹配。重要的是，就学生表现而言，LLM生成的练习与教师手工创建的练习具有可比性，这表明BugSpotter可以成为学习调试的有效且高效的辅助工具。