The rise of Generative AI (GenAI) has reshaped the cybersecurity landscape by enabling new attack vectors and lowering the barrier for executing advanced social engineering campaigns. This study conducts an empirical analysis of jailbreaking vulnerabilities in ChatGPT-4o-Mini, showing that novices can bypass safeguards to generate complete multivector phishing attacks across email, web, SMS, and voice channels. Controlled experiments reveal that role-based jailbreaks produce fully operational attack paths capable of credential harvesting. User studies further demonstrate the disruptive potential of GenAI: novice participants exhibited a 240\% increase in perceived phishing competence, a 400\% improvement in task completion rates, and a 57\% reduction in implementation time when assisted by GenAI compared to traditional internet resources. To address these risks, a transformer-based detection framework was developed, achieving an F1-score of 0.9864 (XLNET) for identifying malicious prompts. The work underscores the urgency of strengthening LLM guardrails and provides an annotated dataset to support future defenses.
翻译:生成式人工智能(GenAI)的兴起通过启用新型攻击向量并降低执行先进社会工程攻击的门槛,重塑了网络安全格局。本研究对ChatGPT-4o-Mini的越狱漏洞进行了实证分析,表明新手能够绕过安全防护措施,在电子邮件、网页、短信和语音渠道上生成完整的多向量钓鱼攻击。控制实验揭示,基于角色的越狱能够产生完全可操作的攻击路径,实现凭据窃取。用户研究进一步展示了GenAI的破坏性潜力:与传统互联网资源相比,在GenAI辅助下,新手参与者的钓鱼能力感知提升了240%,任务完成率提高了400%,实施时间减少了57%。为应对这些风险,我们开发了一种基于Transformer的检测框架,在识别恶意提示词方面取得了0.9864(XLNET)的F1分数。该工作强调了加强大语言模型护栏的紧迫性,并提供了一个带注释的数据集以支持未来防御。