LLMs trained in the understanding of programming syntax are now providing effective assistance to developers and are being used in programming education such as in generation of coding problem examples or providing code explanations. A key aspect of programming education is understanding and dealing with error message. However, 'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler. In this study, building on existing research on programming errors, we first define the types of logical errors that can occur in programming in general. Based on the definition, we propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts. The experimental results indicate that when such logical error descriptions in the prompt are used, the average classifition performance is about 21% higher than the ones without them. We also conducted an experiment for exploiting the relations among errors in generating a new logical error dataset using LLMs. As there is very limited dataset for logical errors such benchmark dataset can be very useful for various programming related applications. We expect that our work can assist novice programmers in identifying the causes of code errors and correct them more effectively.
翻译:基于编程语法理解训练的大语言模型,如今正为开发者提供有效辅助,并被应用于编程教育领域,例如生成编程例题或提供代码解释。编程教育的关键环节在于理解并处理错误信息。然而,当程序运行结果违背程序员意图时所产生的"逻辑错误",编译器并不会提供错误提示。本研究在前人编程错误研究成果基础上,首先定义了编程中可能出现的通用逻辑错误类型。基于该定义,我们提出了一种利用提示词中链式思考与思维树结构中的错误类型关系来检测逻辑错误的高效方法。实验结果表明,当提示词中包含此类逻辑错误描述时,平均分类性能比未包含时提升约21%。我们还进行了一项利用大语言模型生成新型逻辑错误数据集的实验,该实验通过挖掘错误之间的关联关系展开。由于现有逻辑错误数据集极其有限,此类基准数据集将对各类编程相关应用具有重要价值。我们期望本研究能帮助编程初学者更有效地识别代码错误根源并进行修正。