The Scope of ChatGPT in Software Engineering: A Thorough Investigation

ChatGPT demonstrates immense potential to transform software engineering (SE) by exhibiting outstanding performance in tasks such as code and document generation. However, the high reliability and risk control requirements of SE make the lack of interpretability for ChatGPT a concern. To address this issue, we carried out a study evaluating ChatGPT's capabilities and limitations in SE. We broke down the abilities needed for AI models to tackle SE tasks into three categories: 1) syntax understanding, 2) static behavior understanding, and 3) dynamic behavior understanding. Our investigation focused on ChatGPT's ability to comprehend code syntax and semantic structures, including abstract syntax trees (AST), control flow graphs (CFG), and call graphs (CG). We assessed ChatGPT's performance on cross-language tasks involving C, Java, Python, and Solidity. Our findings revealed that while ChatGPT excels at understanding code syntax (AST), it struggles with comprehending code semantics, particularly dynamic semantics. We conclude that ChatGPT possesses capabilities akin to an Abstract Syntax Tree (AST) parser, demonstrating initial competencies in static code analysis. Additionally, our study highlights that ChatGPT is susceptible to hallucination when interpreting code semantic structures and fabricating non-existent facts. These results underscore the need to explore methods for verifying the correctness of ChatGPT's outputs to ensure its dependability in SE. More importantly, our study provide an iniital answer why the generated codes from LLMs are usually synatx correct but vulnerabale.

翻译：ChatGPT在代码生成和文档编写等任务中展现出卓越性能，显示出其变革软件工程（SE）的巨大潜力。然而，软件工程对高可靠性和风险控制的严格要求，使得ChatGPT缺乏可解释性成为一大隐忧。针对这一问题，我们开展了一项研究，评估ChatGPT在软件工程中的能力与局限性。我们将人工智能模型处理软件工程任务所需的能力分解为三类：1）语法理解，2）静态行为理解，以及3）动态行为理解。研究重点聚焦于ChatGPT对代码语法和语义结构的理解能力，涵盖抽象语法树（AST）、控制流图（CFG）和调用图（CG）。我们评估了ChatGPT在涉及C、Java、Python和Solidity的跨语言任务中的表现。研究结果表明，尽管ChatGPT擅长理解代码语法（AST），但在理解代码语义（尤其是动态语义）方面存在困难。我们得出结论：ChatGPT具备类似抽象语法树（AST）解析器的能力，在静态代码分析方面展现出初步能力。此外，我们的研究还指出，ChatGPT在解释代码语义结构时容易产生幻觉，并编造不存在的客观事实。这些结果凸显了探索验证ChatGPT输出正确性方法的必要性，以确保其在软件工程中的可靠性。更重要的是，我们的研究初步解答了为何大语言模型生成的代码通常语法正确但存在漏洞这一问题。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日