ChatGPT demonstrates significant potential to revolutionize software engineering (SE) by exhibiting outstanding performance in SE tasks such as code and document generation. However, the high reliability and risk control requirements in software engineering raise concerns about the lack of interpretability of ChatGPT. To address this concern, we conducted a study to evaluate the capabilities of ChatGPT and its limitations for code analysis in SE. We break down the abilities needed for artificial intelligence (AI) models to address SE tasks related to code analysis into three categories:1) syntax understanding, 2) static behavior understanding, and 3) dynamic behavior understanding. Our investigation focused on the ability of ChatGPT to comprehend code syntax and semantic structures, which include abstract syntax trees (AST), control flow graphs (CFG), and call graphs (CG). We assessed the performance of ChatGPT on cross-language tasks involving C, Java, Python, and Solidity. Our findings revealed that while ChatGPT has a talent for understanding code syntax, it struggles with comprehending code semantics, particularly dynamic semantics. We conclude that ChatGPT possesses capabilities similar to an Abstract Syntax Tree (AST) parser, demonstrating initial competencies in static code analysis. Furthermore, our study highlights that ChatGPT is susceptible to hallucinations when interpreting code semantic structures and fabricating nonexistent facts. These results indicate the need to explore methods to verify the correctness of ChatGPT output to ensure its dependability in SE. More importantly, our study provides an initial answer to why the codes generated by LLM are usually syntax correct but vulnerable.
翻译:ChatGPT在代码和文档生成等软件工程任务中展现出卓越性能,预示着其具备革新软件工程的巨大潜力。然而,软件工程对高可靠性和风险控制的要求引发了对ChatGPT可解释性不足的担忧。针对这一问题,我们开展了一项研究,评估ChatGPT在软件工程代码分析任务中的能力及其局限性。我们将人工智能模型解决代码分析相关软件工程任务所需的能力分解为三类:1)语法理解、2)静态行为理解、3)动态行为理解。本研究聚焦于ChatGPT理解代码语法与语义结构的能力,涵盖抽象语法树、控制流图和调用图。我们评估了ChatGPT在C、Java、Python和Solidity四种语言跨语言任务中的表现。研究发现:ChatGPT虽在理解代码语法方面具有天赋,但在理解代码语义(尤其是动态语义)方面存在困难。结论表明,ChatGPT具备类似抽象语法树解析器的能力,在静态代码分析中展现出初步能力。此外,研究揭示ChatGPT在解释代码语义结构时易产生幻觉,且会编造不存在的客观事实。这些结果指出,亟需探索验证ChatGPT输出正确性的方法,以确保其在软件工程中的可靠性。更重要的是,本研究为解释"大语言模型生成的代码通常语法正确但存在安全漏洞"这一现象提供了初步答案。