Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human intervention. While attackers take advantage of dark LLMs such as XXXGPT and WolfGPT to produce malicious code, ethical hackers can follow similar approaches to automate traditional pentesting workflows. In this work, we present RedShell, a privacy-preserving, hardware-efficient framework that leverages fine-tuned LLMs to assist pentesters in generating offensive PowerShell code targeting Microsoft Windows vulnerabilities. RedShell was trained on a malicious PowerShell dataset from the literature, which we further enhanced with manually curated code samples. Experiments show that our framework achieves over 90% syntactic validity in generated samples and strong semantic alignment with reference pentesting snippets, outperforming state-of-the-art counterparts in distance metrics such as edit distance (above 50% average code similarity). Additionally, functional experiments emphasize the execution reliability of the snippets produced by RedShell in a testing scenario that mirrors real-world settings. This work sheds light on the state-of-the-art research in the field of Generative AI applied to malicious code generation and automated testing, acknowledging the potential benefits that LLMs hold within controlled environments such as pentesting.
翻译:大型语言模型(LLMs)正通过以最小人工干预生成恶意机器码的方式重新定义进攻性网络安全。虽然攻击者利用XXXGPT和WolfGPT等暗网大语言模型生成恶意代码,但道德黑客可遵循类似方法实现传统渗透测试工作流的自动化。本文提出RedShell——一种兼顾隐私保护与硬件效率的框架,通过微调大语言模型辅助渗透测试人员生成针对微软Windows漏洞的进攻性PowerShell代码。RedShell基于文献中的恶意PowerShell数据集进行训练,并进一步扩充了人工筛选的代码样本。实验表明,我们的框架在生成样本中实现了超过90%的语法有效性,并与参考渗透测试代码片段保持强语义对齐,在编辑距离等度量指标上优于现有最优方法(平均代码相似度超过50%)。此外,功能实验验证了RedShell生成的代码片段在模拟真实环境的测试场景中的执行可靠性。本研究揭示了生成式AI在恶意代码生成与自动化测试领域的最新研究进展,同时明确了大语言模型在渗透测试等受控环境中的潜在应用价值。