Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively
翻译:编程作业反馈的自动生成对编程教育具有显著价值,尤其是在处理高级作业时。自动程序修复技术,特别是基于大语言模型的方法,因其修复入门级作业的潜力而获得广泛认可。然而,用于评估的程序相对简单,现有方法在修复高级编程课程中的程序时表现如何尚不明确。为解决这一局限,我们从一门高级编程课程中整理了一个全新的高级学生作业数据集——Defects4DS。随后,我们识别出修复高级任务缺陷所面临的挑战。基于分析,我们开发了一个名为PaR的框架,该框架由大语言模型驱动。PaR通过三个阶段运作:同伴解决方案选择、多源提示生成和程序修复。同伴解决方案选择基于词汇、语义和句法标准识别紧密相关的同伴程序。多源提示生成则巧妙整合多种信息源,为最后的程序修复阶段构建全面且信息丰富的提示。在Defects4DS和另一个经过充分研究的ITSP数据集上的评估表明,PaR实现了新的最优性能,与先前基于大语言模型和符号方法的先进技术相比,修复率分别提升了19.94%和15.2%。