Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, we investigate how well pre-trained models can understand and perform code execution. We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution, which challenges existing models such as Codex. We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension. We evaluate CodeExecutor on code execution and show its promising performance and limitations. We also demonstrate its potential benefits for code intelligence tasks such as zero-shot code-to-code search and text-to-code generation. Our analysis provides insights into the learning and generalization abilities of pre-trained models for code execution.
翻译:代码执行是编程语言语义的基本方面,它反映了代码的确切行为。然而,大多数用于代码智能的预训练模型忽略了执行轨迹,仅依赖源代码和语法结构。本文研究了预训练模型理解和执行代码的能力,提出了一种基于变体的数据增强技术,构建了大规模且逼真的Python代码执行数据集及任务,对Codex等现有模型构成了挑战。我们进一步提出了CodeExecutor,一种利用代码执行预训练和课程学习来增强语义理解的Transformer模型。在代码执行任务上评估CodeExecutor后,我们展示了其令人鼓舞的性能与局限性,并证明了其在代码智能任务(如零样本代码到代码搜索和文本到代码生成)中的潜在优势。我们的分析揭示了预训练模型在代码执行方面的学习与泛化能力。