Pre-trained code generation models (PCGMs) have been widely applied in neural code generation which can generate executable code from functional descriptions in natural languages, possibly together with signatures. Despite substantial performance improvement of PCGMs, the role of method names in neural code generation has not been thoroughly investigated. In this paper, we study and demonstrate the potential of benefiting from method names to enhance the performance of PCGMs, from a model robustness perspective. Specifically, we propose a novel approach, named RADAR (neuRAl coDe generAtor Robustifier). RADAR consists of two components: RADAR-Attack and RADAR-Defense. The former attacks a PCGM by generating adversarial method names as part of the input, which are semantic and visual similar to the original input, but may trick the PCGM to generate completely unrelated code snippets. As a countermeasure to such attacks, RADAR-Defense synthesizes a new method name from the functional description and supplies it to the PCGM. Evaluation results show that RADAR-Attack can reduce the CodeBLEU of generated code by 19.72% to 38.74% in three state-of-the-art PCGMs (i.e., CodeGPT, PLBART, and CodeT5) in the fine-tuning code generation task, and reduce the Pass@1 of generated code by 32.28% to 44.42% in three state-of-the-art PCGMs (i.e., Replit, CodeGen, and CodeT5+) in the zero-shot code generation task. Moreover, RADAR-Defense is able to reinstate the performance of PCGMs with synthesized method names. These results highlight the importance of good method names in neural code generation and implicate the benefits of studying model robustness in software engineering.
翻译:预训练代码生成模型(PCGMs)已广泛应用于神经代码生成领域,这类模型能够根据自然语言功能描述(可能附带签名)生成可执行代码。尽管PCGMs的性能取得了显著提升,但方法名称在神经代码生成中的作用尚未得到深入研究。本文从模型鲁棒性的角度,研究并论证了利用方法名称提升PCGMs性能的潜力。具体而言,我们提出了一种名为RADAR(neuRAl coDe generAtor Robustifier)的新方法。RADAR包含两个组件:RADAR-Attack和RADAR-Defense。前者通过生成对抗性方法名称作为输入的一部分来攻击PCGM,这些名称在语义和视觉上与原始输入相似,但可能诱使PCGM生成完全不相关的代码片段。作为针对此类攻击的防御措施,RADAR-Defense根据功能描述合成新的方法名称,并将其提供给PCGM。评估结果表明,在微调代码生成任务中,RADAR-Attack可使三个最先进的PCGMs(即CodeGPT、PLBART和CodeT5)生成代码的CodeBLEU下降19.72%至38.74%;在零样本代码生成任务中,可使三个最先进的PCGMs(即Replit、CodeGen和CodeT5+)生成代码的Pass@1下降32.28%至44.42%。此外,RADAR-Defense能够通过合成方法名称恢复PCGMs的性能。这些结果凸显了好的方法名称在神经代码生成中的重要性,并揭示了在软件工程中研究模型鲁棒性的价值。