Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.
翻译:自动化程序分析是计算机科学(特别是形式化方法与人工智能)中多个领域的关键研究方向。由于程序等价问题的不可判定性,比较两个程序极具挑战性。通常,为了比较两个程序,需要建立两个程序变量集之间的关联关系。因此,变量映射在程序等价性验证、程序分析、程序修复及克隆检测等众多任务中具有重要应用价值。本研究提出采用图神经网络(GNNs),基于两个程序的抽象语法树(ASTs)实现变量集映射。为验证变量映射的有效性,我们以程序修复任务为场景,针对初级编程作业(IPAs)中学员反复出现的典型缺陷,展示了该映射的三种应用案例。实验结果表明,在包含4166对错误/正确程序的测试集上,本方法正确映射了83%的评估数据。同时,实验显示当前依赖程序结构的先进程序修复方法仅能修复约72%的错误程序,而本方法仅基于变量映射即可实现约88.5%的修复率。