Modern code generation tools, utilizing AI models like Large Language Models (LLMs), have gained popularity for producing functional code. However, their usage presents security challenges, often resulting in insecure code merging into the code base. Evaluating the quality of generated code, especially its security, is crucial. While prior research explored various aspects of code generation, the focus on security has been limited, mostly examining code produced in controlled environments rather than real-world scenarios. To address this gap, we conducted an empirical study, analyzing code snippets generated by GitHub Copilot from GitHub projects. Our analysis identified 452 snippets generated by Copilot, revealing a high likelihood of security issues, with 32.8% of Python and 24.5% of JavaScript snippets affected. These issues span 38 different Common Weakness Enumeration (CWE) categories, including significant ones like CWE-330: Use of Insufficiently Random Values, CWE-78: OS Command Injection, and CWE-94: Improper Control of Generation of Code. Notably, eight CWEs are among the 2023 CWE Top-25, highlighting their severity. Our findings confirm that developers should be careful when adding code generated by Copilot and should also run appropriate security checks as they accept the suggested code. It also shows that practitioners should cultivate corresponding security awareness and skills.
翻译:现代代码生成工具利用大型语言模型(LLMs)等AI模型,在生成功能性代码方面日益流行。然而,其使用带来了安全挑战,常导致不安全代码并入代码库。评估生成代码的质量(尤其是安全性)至关重要。尽管先前研究探讨了代码生成的多个方面,但对安全问题的关注有限,多数研究仅在受控环境(而非真实场景)中检验生成的代码。为弥补这一空白,我们开展了一项实证研究,分析了来自GitHub项目中由GitHub Copilot生成的代码片段。研究共识别出452个由Copilot生成的代码片段,发现其存在较高的安全风险:32.8%的Python片段和24.5%的JavaScript片段受到影响。这些问题涵盖38个不同的通用弱点枚举(CWE)类别,包括CWE-330:使用不充分随机值、CWE-78:操作系统命令注入、CWE-94:代码生成控制不当等关键类别。值得注意的是,其中8个CWE属于2023年CWE Top-25榜单,凸显其严重性。我们的发现证实,开发者在将Copilot生成的代码纳入项目时应保持谨慎,并在接受建议代码时执行适当的安全检查。同时,实践者应培养相应的安全意识和技能。