The Large Language Models (LLMs), such as GPT and BERT, were proposed for natural language processing (NLP) and have shown promising results as general-purpose language models. An increasing number of industry professionals and researchers are adopting LLMs for program analysis tasks. However, one significant difference between programming languages and natural languages is that a programmer has the flexibility to assign any names to variables, methods, and functions in the program, whereas a natural language writer does not. Intuitively, the quality of naming in a program affects the performance of LLMs in program analysis tasks. This paper investigates how naming affects LLMs on code analysis tasks. Specifically, we create a set of datasets with code containing nonsense or misleading names for variables, methods, and functions, respectively. We then use well-trained models (CodeBERT) to perform code analysis tasks on these datasets. The experimental results show that naming has a significant impact on the performance of code analysis tasks based on LLMs, indicating that code representation learning based on LLMs heavily relies on well-defined names in code. Additionally, we conduct a case study on some special code analysis tasks using GPT, providing further insights.
翻译:大型语言模型(如GPT和BERT)最初为自然语言处理任务而提出,并已展现出作为通用语言模型的潜力。越来越多的行业从业者和研究者开始将LLM应用于程序分析任务。然而,编程语言与自然语言之间存在一个显著差异:程序员可以自由地为程序中的变量、方法和函数赋予任意名称,而自然语言写作者则不具备这种灵活性。直观上,程序命名的质量会影响LLM在程序分析任务中的性能。本文探究了命名如何影响LLM在代码分析任务中的表现。具体而言,我们构建了一系列数据集,其中分别包含变量、方法和函数使用无意义或误导性名称的代码。随后,我们使用训练良好的模型(CodeBERT)在这些数据集上执行代码分析任务。实验结果表明,命名对基于LLM的代码分析任务性能具有显著影响,这说明基于LLM的代码表示学习高度依赖于代码中定义良好的名称。此外,我们使用GPT对某些特殊代码分析任务进行了案例研究,提供了进一步的洞见。