Supporting learners in introductory programming assignments at scale is a necessity. This support includes automated feedback on what learners did incorrectly. Existing approaches cast the problem as automatically repairing learners' incorrect programs extrapolating the data from an existing correct program from other learners. However, such approaches are limited because they only compare programs with similar control flow and order of statements. A potentially valuable set of repair feedback from flexible comparisons is thus missing. In this paper, we present several modifications to CLARA, a data-driven automated repair approach that is open source, to deal with real-world introductory programs. We extend CLARA's abstract syntax tree processor to handle common introductory programming constructs. Additionally, we propose a flexible alignment algorithm over control flow graphs where we enrich nodes with semantic annotations extracted from programs using operations and calls. Using this alignment, we modify an incorrect program's control flow graph to match the correct programs to apply CLARA's original repair process. We evaluate our approach against a baseline on the twenty most popular programming problems in Codeforces. Our results indicate that flexible alignment has a significantly higher percentage of successful repairs at 46% compared to 5% for baseline CLARA. Our implementation is available at https://github.com/towhidabsar/clara.
翻译:在规模化环境中为编程入门学习者提供支持至关重要,这包括针对学习者错误操作的自动反馈。现有方法将问题转化为自动修复学习者的错误程序,通过从其他学习者的正确程序中推断数据来实现。然而,此类方法存在局限性,因为它们仅能比较具有相似控制流和语句顺序的程序,导致无法通过灵活比较获得有价值的修复反馈。本文针对开源数据驱动自动修复方法CLARA提出多项改进,以处理真实世界的入门级编程程序。我们扩展了CLARA的抽象语法树处理器,使其支持常见的入门编程结构。此外,我们提出一种基于控制流图的灵活对齐算法,通过操作和调用从程序中提取的语义注释来丰富图节点。利用该对齐方法,我们修改错误程序的控制流图以匹配正确程序,从而应用CLARA原有的修复流程。我们以Codeforces平台上二十个最受欢迎的编程问题为基准进行评估,结果表明灵活对齐方法的成功修复比例高达46%,而基准CLARA仅为5%。我们的实现代码发布在https://github.com/towhidabsar/clara。