Along with the development of large language models (LLMs), e.g., ChatGPT, many existing approaches and tools for software security are changing. It is, therefore, essential to understand how security-aware these models are and how these models impact software security practices and education. In exercises of a software security course at our university, we ask students to identify and fix vulnerabilities we insert in a web application using state-of-the-art tools. After ChatGPT, especially the GPT-4 version of the model, we want to know how the students can possibly use ChatGPT to complete the exercise tasks. We input the vulnerable code to ChatGPT and measure its accuracy in vulnerability identification and fixing. In addition, we investigated whether ChatGPT can provide a proper source of information to support its outputs. Results show that ChatGPT can identify 20 of the 28 vulnerabilities we inserted in the web application in a white-box setting, reported three false positives, and found four extra vulnerabilities beyond the ones we inserted. ChatGPT makes nine satisfactory penetration testing and fixing recommendations for the ten vulnerabilities we want students to fix and can often point to related sources of information.
翻译:随着大型语言模型(如ChatGPT)的发展,许多现有的软件安全方法与工具正在发生变化。因此,理解这些模型的安全意识程度及其对软件安全实践与教育的影响至关重要。在我们大学软件安全课程的练习中,要求学生使用最先进的工具识别并修复我们预先植入Web应用程序的漏洞。在ChatGPT(尤其是GPT-4版本)出现后,我们想了解学生如何可能利用ChatGPT完成练习任务。我们将包含漏洞的代码输入ChatGPT,测量其在漏洞识别与修复方面的准确性。此外,我们还探究了ChatGPT是否能提供适当的信息来源以支持其输出。结果表明:在白盒设置下,ChatGPT能识别出我们植入Web应用程序的28个漏洞中的20个,报告了3个误报,并额外发现了4个非植入漏洞。针对我们要求学生修复的10个漏洞,ChatGPT提供了9个令人满意的渗透测试与修复建议,且通常能指向相关的信息来源。