Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token sequences or Abstract Syntax Trees (ASTs), they often miss the change's semantic intent and the context of modified lines. To bridge this gap, we introduce a novel method, Patcherizer. It delves into the intentions of context and structure, merging the surrounding code context with two innovative representations. These capture the intention in code changes and the intention in AST structural modifications pre and post-patch. This holistic representation aptly captures a patch's underlying intentions. Patcherizer employs graph convolutional neural networks for structural intention graph representation and transformers for intention sequence representation. We evaluated Patcherizer's embeddings' versatility in three areas: (1) Patch description generation, (2) Patch accuracy prediction, and (3) Patch intention identification. Our experiments demonstrate the representation's efficacy across all tasks, outperforming state-of-the-art methods. For example, in patch description generation, Patcherizer excels, showing an average boost of 19.39% in BLEU, 8.71% in ROUGE-L, and 34.03% in METEOR scores.
翻译:补丁表示在自动化各类软件工程任务中至关重要,例如判断补丁准确性或总结代码变更。尽管近年来的研究采用深度学习方法进行补丁表示,聚焦于令牌序列或抽象语法树(AST),但它们往往忽略了变更的语义意图以及修改行的上下文。为弥补这一不足,我们提出了一种新方法——Patcherizer。该方法深入探索上下文与结构的意图,将周围代码上下文与两种创新表示相融合。这两种表示分别捕获了代码变更中的意图,以及补丁前后AST结构修改中的意图。这种整体表示恰当地反映了补丁的潜在意图。Patcherizer采用图卷积神经网络进行结构意图图表示,并利用Transformer进行意图序列表示。我们从三个方面评估了Patcherizer嵌入的通用性:(1)补丁描述生成、(2)补丁准确性预测及(3)补丁意图识别。实验结果表明,该表示在所有任务中均表现出色,超越了现有最优方法。例如,在补丁描述生成中,Patcherizer表现优异,在BLEU、ROUGE-L和METEOR分数上分别平均提升19.39%、8.71%和34.03%。