LLMs in Web-Development: Evaluating LLM-Generated PHP code unveiling vulnerabilities and limitations

This research carries out a comprehensive examination of web application code security, when generated by Large Language Models through analyzing a dataset comprising 2,500 small dynamic PHP websites. These AI-generated sites are scanned for security vulnerabilities after being deployed as standalone websites in Docker containers. The evaluation of the websites was conducted using a hybrid methodology, incorporating the Burp Suite active scanner, static analysis, and manual checks. Our investigation zeroes in on identifying and analyzing File Upload, SQL Injection, Stored XSS, and Reflected XSS. This approach not only underscores the potential security flaws within AI-generated PHP code but also provides a critical perspective on the reliability and security implications of deploying such code in real-world scenarios. Our evaluation confirms that 27% of the programs generated by GPT-4 verifiably contains vulnerabilities in the PHP code, where this number -- based on static scanning and manual verification -- is potentially much higher. This poses a substantial risks to software safety and security. In an effort to contribute to the research community and foster further analysis, we have made the source codes publicly available, alongside a record enumerating the detected vulnerabilities for each sample. This study not only sheds light on the security aspects of AI-generated code but also underscores the critical need for rigorous testing and evaluation of such technologies for software development.

翻译：本研究对大语言模型生成的Web应用程序代码安全性进行了全面考察，通过分析由2500个小型动态PHP网站组成的数据集展开研究。这些AI生成的网站在Docker容器中作为独立网站部署后，被进行了安全漏洞扫描。我们采用混合方法论对网站进行评估，综合运用了Burp Suite主动扫描器、静态分析与人工核查。研究重点聚焦于文件上传、SQL注入、存储型XSS及反射型XSS的识别与分析。该方法不仅凸显了AI生成PHP代码中潜在的安全缺陷，更为评估此类代码在实际场景中部署的可靠性及安全性影响提供了批判性视角。评估证实，GPT-4生成的程序中27%的PHP代码确实存在漏洞——基于静态扫描与人工验证，该数字可能显著更高。这给软件安全带来了重大风险。为助力学术界研究并促进深入分析，我们已将源代码及记载每个样本检测到的漏洞记录公开发布。本研究不仅揭示了AI生成代码的安全性问题，更强调了在软件开发中对此类技术进行严格测试与评估的迫切必要性。

相关内容

PHP

关注 296

PHP 是英文超级文本预处理语言（PHP：Hypertext Preprocessor）的缩写。PHP 是一种 HTML 内嵌式的语言，是一种在服务器端执行的嵌入 HTML 文档的脚本语言，语言的风格有类似于 C 语言，被广泛的运用。PHP 具有非常强大的功能，所有的 CGI 的功能 PHP 都能实现，而且支持几乎所有流行的数据库以及操作系统。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日