Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.
翻译:自动化程序分析是计算机科学多个领域(特别是形式化方法和人工智能)中的关键研究课题。由于程序等价性问题不可判定,比较两个程序极具挑战性。通常,为了比较两个程序,需要建立两个程序变量集之间的关系。因此,在程序等价性、程序分析、程序修复和克隆检测等众多任务中,变量映射具有重要价值。本研究提出基于图神经网络(GNN),利用两个程序的抽象语法树(AST)来映射它们的变量集。为展示变量映射的能力,我们以程序修复任务中的三个应用场景为例,针对新手程序员在入门编程作业(IPA)中常见且反复出现的错误进行修复。在包含4166对错误/正确程序的数据集上,实验结果表明我们的方法正确映射了83%的评估数据。此外,实验显示当前最先进的程序修复方法(高度依赖程序结构)仅能修复约72%的错误程序,而基于变量映射的方法可修复约88.5%的错误程序。