As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models.
翻译:由于Windows操作系统是最常被攻击的系统之一,PowerShell语言已成为恶意行为者和网络安全专业人员(例如用于渗透测试)的关键工具。本研究探索了AI代码生成中一个尚未涉足的领域,即通过神经机器翻译(NMT)从自然语言描述自动生成恶意PowerShell代码。为进行训练和评估,我们提出了两个新颖的PowerShell代码样本数据集,一个包含人工编写的自然语言描述,另一个为纯代码数据集用于强化训练。我们对先进的NMT模型进行了全面评估,并从静态和动态两个维度分析了生成的代码。结果表明,使用我们的数据集调优NMT在生成恶意PowerShell代码方面效果显著。与最广泛使用的LLM服务ChatGPT进行对比分析,揭示了我们的微调模型具有专业化优势。