Large Language Models (LLMs) are increasingly capable, aiding in tasks such as content generation, yet they also pose risks, particularly in generating harmful spear-phishing emails. These emails, crafted to entice clicks on malicious URLs, threaten personal information security. This paper proposes an adversarial framework, SpearBot, which utilizes LLMs to generate spear-phishing emails with various phishing strategies. Through specifically crafted jailbreak prompts, SpearBot circumvents security policies and introduces other LLM instances as critics. When a phishing email is identified by the critic, SpearBot refines the generated email based on the critique feedback until it can no longer be recognized as phishing, thereby enhancing its deceptive quality. To evaluate the effectiveness of SpearBot, we implement various machine-based defenders and assess how well the phishing emails generated could deceive them. Results show these emails often evade detection to a large extent, underscoring their deceptive quality. Additionally, human evaluations of the emails' readability and deception are conducted through questionnaires, confirming their convincing nature and the significant potential harm of the generated phishing emails.
翻译:大语言模型(LLMs)的能力日益增强,能够辅助完成内容生成等任务,但也带来了风险,尤其是在生成有害的鱼叉式钓鱼邮件方面。这些邮件旨在诱使用户点击恶意URL,威胁个人信息安全。本文提出了一种对抗性框架SpearBot,该框架利用LLMs生成采用多种钓鱼策略的鱼叉式钓鱼邮件。通过专门设计的越狱提示,SpearBot绕过安全策略,并引入其他LLM实例作为批判者。当钓鱼邮件被批判者识别时,SpearBot会根据批判反馈优化生成的邮件,直至其无法再被识别为钓鱼邮件,从而提升其欺骗性质量。为评估SpearBot的有效性,我们部署了多种基于机器的防御器,并评估生成的钓鱼邮件能够多大程度上欺骗这些防御器。结果表明,这些邮件在很大程度上能够逃避检测,突显了其欺骗性质量。此外,通过问卷调查对邮件的可读性和欺骗性进行了人工评估,证实了生成邮件的说服力以及鱼叉式钓鱼邮件可能造成的重大潜在危害。