LLMs can be used on code analysis tasks like code review, vulnerabilities analysis and etc. However, the strengths and limitations of adopting these LLMs to the code analysis are still unclear. In this paper, we delve into LLMs' capabilities in security-oriented program analysis, considering perspectives from both attackers and security analysts. We focus on two representative LLMs, ChatGPT and CodeBert, and evaluate their performance in solving typical analytic tasks with varying levels of difficulty. Our study demonstrates the LLM's efficiency in learning high-level semantics from code, positioning ChatGPT as a potential asset in security-oriented contexts. However, it is essential to acknowledge certain limitations, such as the heavy reliance on well-defined variable and function names, making them unable to learn from anonymized code. For example, the performance of these LLMs heavily relies on the well-defined variable and function names, therefore, will not be able to learn anonymized code. We believe that the concerns raised in this case study deserve in-depth investigation in the future.
翻译:大型语言模型可用于代码审查、漏洞分析等代码分析任务。然而,采用这些LLM进行代码分析的优势和局限性尚不明确。本文从攻击者和安全分析师的双重视角,深入探究LLM在面向安全的程序分析中的能力。我们聚焦于两个代表性LLM——ChatGPT和CodeBERT,评估它们在解决不同难度层次的典型分析任务中的表现。研究表明,LLM能够高效学习代码的高层语义,使ChatGPT成为面向安全场景中的潜在资产。但需承认其局限性,例如严重依赖定义明确的变量名和函数名,导致无法学习匿名化代码。例如,这些LLM的性能高度依赖于定义明确的变量和函数名称,因此无法从匿名化代码中学习。我们认为本案例研究中提出的问题值得未来深入探究。