Large Language Models (LLMs), such as GPT and BERT, have demonstrated remarkable capabilities in addressing neural language process tasks. Recently, the release of ChatGPT has garnered significant attention due to its ability to analyze, comprehend, and synthesize information from user inputs. Therefore, these LLMs were adopted by researchers in many different domains. In the realm of code analysis, researchers have applied LLMs to tasks like code review and code generation. However, we observed that the strengths and limitations of adopting these LLMs to the code analysis have not been investigated. In this paper, we delve into LLMs' capabilities in security-oriented program analysis, considering perspectives from both attackers and security analysts. We focus on two representative LLMs, ChatGPT and CodeBert, and evaluate their performance in solving typical analytic tasks with varying levels of difficulty. Given the different natures of ChatGPT and CodeBERT, we conduct a qualitative analysis of the model's output for ChatGPT and a quantitative analysis for CodeBERT, respectively. For ChatGPT, we present a case study involving several security-oriented program analysis tasks while deliberately introducing challenges to assess its responses. On the other hand, for CodeBERT, we systematically analyze and classify the features in code, quantitatively evaluating the impact of these features on the model's performance. Our study demonstrates the LLM's efficiency in learning high-level semantics from code, positioning ChatGPT as a potential asset in security-oriented contexts. However, it is essential to acknowledge certain limitations, such as the heavy reliance on well-defined variable and function names, making them unable to learn from anonymized code. We hope that our findings and analysis will offer valuable insights for future researchers in this domain.
翻译:大型语言模型(LLMs),如GPT和BERT,已在自然语言处理任务中展现出卓越能力。近期,ChatGPT的发布因其对用户输入的分析、理解和综合能力而备受关注。因此,研究人员将这类LLMs应用于多个不同领域。在代码分析领域,研究者已尝试将LLMs用于代码审查和代码生成等任务。然而,我们发现这些LLMs在代码分析中的优势与局限性尚未得到系统研究。本文深入探讨了LLMs在安全导向程序分析中的能力,同时兼顾攻击者与安全分析师的双重视角。我们聚焦于两个代表性LLMs——ChatGPT与CodeBERT,评估其在处理不同难度的典型分析任务时的表现。考虑到ChatGPT与CodeBERT的差异性,我们分别对ChatGPT模型输出进行定性分析,对CodeBERT进行定量分析。针对ChatGPT,我们通过案例研究设计若干安全导向程序分析任务,并特意引入挑战性因素以评估其响应;针对CodeBERT,我们系统性地分析与分类代码特征,定量评估这些特征对模型性能的影响。研究表明,LLMs能够高效学习代码中的高层语义,使ChatGPT成为安全场景中的潜在利器。然而,必须承认其局限性,例如过度依赖定义明确的变量名与函数名,导致无法从匿名化代码中学习。希望本文的研究发现与分析能为该领域未来研究者提供有价值的参考。