Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token sequences or Abstract Syntax Trees (ASTs), they often miss the change's semantic intent and the context of modified lines. To bridge this gap, we introduce a novel method, Patcherizer. It delves into the intentions of context and structure, merging the surrounding code context with two innovative representations. These capture the intention in code changes and the intention in AST structural modifications pre and post-patch. This holistic representation aptly captures a patch's underlying intentions. Patcherizer employs graph convolutional neural networks for structural intention graph representation and transformers for intention sequence representation. We evaluated Patcherizer's embeddings' versatility in three areas: (1) Patch description generation, (2) Patch accuracy prediction, and (3) Patch intention identification. Our experiments demonstrate the representation's efficacy across all tasks, outperforming state-of-the-art methods. For example, in patch description generation, Patcherizer excels, showing an average boost of 19.39% in BLEU, 8.71% in ROUGE-L, and 34.03% in METEOR scores.
翻译:补丁表示在自动化各种软件工程任务中至关重要,例如确定补丁准确性或总结代码变更。尽管最近的研究采用深度学习进行补丁表示,专注于令牌序列或抽象语法树(ASTs),但它们往往忽略了变更的语义意图和修改行的上下文。为填补这一空白,我们提出了一种新颖方法Patcherizer。它深入探究上下文和结构的意图,将周围代码上下文与两种创新表示相融合。这些表示捕获了代码变更的意图以及补丁前后AST结构修改的意图。这种整体表示恰当地捕捉了补丁的潜在意图。Patcherizer采用图卷积神经网络进行结构意图图表示,并使用Transformer进行意图序列表示。我们从三个方面评估了Patcherizer嵌入的多功能性:(1)补丁描述生成,(2)补丁准确性预测,以及(3)补丁意图识别。我们的实验证明了该表示在所有任务中的有效性,优于最先进的方法。例如,在补丁描述生成中,Patcherizer表现卓越,BLEU分数平均提升19.39%,ROUGE-L提升8.71%,METEOR分数提升34.03%。