Security Weaknesses of Copilot Generated Code in GitHub

Modern code generation tools use AI models, particularly Large Language Models (LLMs), to generate functional and complete code. While such tools are becoming popular and widely available for developers, using these tools is often accompanied by security challenges. Therefore, it is important to assess the quality of the generated code, especially in terms of its security. Researchers have recently explored various aspects of code generation tools, including security. However, many open questions about the security of the generated code require further investigation, especially the security issues of automatically generated code in the wild. To this end, we conducted an empirical study by analyzing the security weaknesses in code snippets generated by GitHub Copilot that are found as part of publicly available projects hosted on GitHub. The goal is to investigate the types of security issues and their scale in real-world scenarios (rather than crafted scenarios). To this end, we identified 435 code snippets generated by Copilot from publicly available projects. We then conducted extensive security analysis to identify Common Weakness Enumeration (CWE) instances in these code snippets. The results show that (1) 35.8% of Copilot generated code snippets contain CWEs, and those issues are spread across multiple languages, (2) the security weaknesses are diverse and related to 42 different CWEs, in which CWE-78: OS Command Injection, CWE-330: Use of Insufficiently Random Values, and CWE-703: Improper Check or Handling of Exceptional Conditions occurred the most frequently, and (3) among the 42 CWEs identified, 11 of those belong to the currently recognized 2022 CWE Top-25. Our findings confirm that developers should be careful when adding code generated by Copilot (and similar AI code generation tools) and should also run appropriate security checks as they accept the suggested code.

翻译：现代代码生成工具使用AI模型，特别是大型语言模型（LLMs），来生成功能完整且可用的代码。尽管这类工具正逐渐普及并广泛可供开发者使用，但其使用常伴随着安全挑战。因此，评估生成代码的质量（尤其是安全性）至关重要。研究人员近期已探索代码生成工具的多个方面（包括安全性），然而关于生成代码安全性的许多未解问题仍需深入研究，特别是自动生成代码在实际场景中的安全问题。为此，我们开展了一项实证研究，通过分析从GitHub公共项目中提取的GitHub Copilot生成代码片段中的安全弱点，旨在探究现实场景（而非人为构造场景）中安全问题的类型及规模。我们识别出435个来自公开项目的Copilot生成代码片段，并对其进行广泛的安全分析以识别通用弱点枚举（CWE）实例。结果表明：（1）35.8%的Copilot生成代码片段包含CWE，且这些问题跨多种编程语言分布；（2）安全弱点多样化，涉及42种不同的CWE，其中CWE-78：操作系统命令注入、CWE-330：使用不充分随机值、CWE-703：异常条件检查或处理不当出现最为频繁；（3）在识别的42个CWE中，11个属于当前公认的2022年CWE Top-25榜单。我们的研究结果证实，开发者在采纳Copilot（及类似AI代码生成工具）生成的代码时应谨慎行事，并在接受建议代码前执行适当的安全检查。