The critical threat of phishing emails has been further exacerbated by the potential of LLMs to generate highly targeted, personalized, and automated spear phishing attacks. Two critical problems concerning LLM-facilitated phishing require further investigation: 1) Existing studies on lateral phishing lack specific examination of LLM integration for large-scale attacks targeting the entire organization, and 2) Current anti-phishing infrastructure, despite its extensive development, lacks the capability to prevent LLM-generated attacks, potentially impacting both employees and IT security incident management. However, the execution of such investigative studies necessitates a real-world environment, one that functions during regular business operations and mirrors the complexity of a large organizational infrastructure. This setting must also offer the flexibility required to facilitate a diverse array of experimental conditions, particularly the incorporation of phishing emails crafted by LLMs. This study is a pioneering exploration into the use of Large Language Models (LLMs) for the creation of targeted lateral phishing emails, targeting a large tier 1 university's operation and workforce of approximately 9,000 individuals over an 11-month period. It also evaluates the capability of email filtering infrastructure to detect such LLM-generated phishing attempts, providing insights into their effectiveness and identifying potential areas for improvement. Based on our findings, we propose machine learning-based detection techniques for such emails to detect LLM-generated phishing emails that were missed by the existing infrastructure, with an F1-score of 98.96.
翻译:网络钓鱼邮件的严重威胁因大型语言模型(LLM)生成高度精准、个性化及自动化鱼叉式网络钓鱼攻击的潜力而进一步加剧。涉及LLM辅助网络钓鱼的两个关键问题亟需深入研究:1)现有关于横向网络钓鱼的研究缺乏对集成LLM实施针对整个组织的大规模攻击的专门探讨;2)当前反网络钓鱼基础设施虽已广泛发展,但缺乏防范LLM生成攻击的能力,这可能同时影响员工及IT安全事件管理。然而,此类研究需在真实环境中开展——该环境需在日常业务运营中运行,并体现大型组织基础设施的复杂性,同时具备支持多样化实验条件的灵活性,尤其是纳入由LLM编写的网络钓鱼邮件。本研究首次探索利用大型语言模型生成针对性横向网络钓鱼邮件,针对一所大型一级大学的运营体系及约9000名员工展开为期11个月的实验。此外,研究评估了电子邮件过滤基础设施检测此类LLM生成网络钓鱼攻击的能力,揭示了其有效性并识别出潜在改进方向。基于研究结果,我们提出了基于机器学习的检测技术,用于识别被现有基础设施遗漏的LLM生成网络钓鱼邮件,其F1分数达到98.96。