Devising and Detecting Phishing: large language models vs. Smaller Human Models

AI programs, built using large language models, make it possible to automatically create phishing emails based on a few data points about a user. They stand in contrast to traditional phishing emails that hackers manually design using general rules gleaned from experience. The V-Triad is an advanced set of rules for manually designing phishing emails to exploit our cognitive heuristics and biases. In this study, we compare the performance of phishing emails created automatically by GPT-4 and manually using the V-Triad. We also combine GPT-4 with the V-Triad to assess their combined potential. A fourth group, exposed to generic phishing emails, was our control group. We utilized a factorial approach, sending emails to 112 randomly selected participants recruited for the study. The control group emails received a click-through rate between 19-28%, the GPT-generated emails 30-44%, emails generated by the V-Triad 69-79%, and emails generated by GPT and the V-Triad 43-81%. Each participant was asked to explain for why they pressed or did not press a link in the email. These answers often contradict each other, highlighting the need for personalized content. The cues that make one person avoid phishing emails make another person fall for them. Next, we used four popular large language models (GPT, Claude, PaLM, and LLaMA) to detect the intention of phishing emails and compare the results to human detection. The language models demonstrated a strong ability to detect malicious intent, even in non-obvious phishing emails. They sometimes surpassed human detection, although often being slightly less accurate than humans.

翻译：基于大型语言模型构建的AI程序，能够根据用户的少量数据点自动生成钓鱼邮件。这与传统上黑客依靠经验总结的通用规则手动设计的钓鱼邮件形成鲜明对比。V-Triad是一套用于手动设计钓鱼邮件以利用人类认知启发与偏见的先进规则体系。本研究比较了由GPT-4自动生成的钓鱼邮件与使用V-Triad手动设计的钓鱼邮件的表现，并进一步将GPT-4与V-Triad结合，评估其联合潜力。第四组暴露于通用钓鱼邮件的参与者作为对照组。我们采用因子实验设计，向招募的112名随机选取的参与者发送邮件。对照组邮件的点击率介于19-28%；GPT生成邮件为30-44%；V-Triad生成邮件为69-79%；GPT与V-Triad联合生成邮件为43-81%。每位参与者需解释点击或未点击邮件链接的原因，这些解释常相互矛盾，凸显了内容个性化的必要性——促使某用户避开钓鱼邮件的线索，反而可能使另一用户上当。随后，我们使用四种主流大型语言模型（GPT、Claude、PaLM和LLaMA）检测钓鱼邮件的意图，并将结果与人类检测能力对比。语言模型展现出强大的恶意意图检测能力，即使面对非显性钓鱼邮件亦不例外，其性能有时超越人类，尽管整体上略逊于人类精度。