The escalating threat of phishing emails has become increasingly sophisticated with the rise of Large Language Models (LLMs). As attackers exploit LLMs to craft more convincing and evasive phishing emails, it is crucial to assess the resilience of current phishing defenses. In this study we conduct a comprehensive evaluation of traditional phishing detectors, such as Gmail Spam Filter, Apache SpamAssassin, and Proofpoint, as well as machine learning models like SVM, Logistic Regression, and Naive Bayes, in identifying both traditional and LLM-rephrased phishing emails. We also explore the emerging role of LLMs as phishing detection tools, a method already adopted by companies like NTT Security Holdings and JPMorgan Chase. Our results reveal notable declines in detection accuracy for rephrased emails across all detectors, highlighting critical weaknesses in current phishing defenses. As the threat landscape evolves, our findings underscore the need for stronger security controls and regulatory oversight on LLM-generated content to prevent its misuse in creating advanced phishing attacks. This study contributes to the development of more effective Cyber Threat Intelligence (CTI) by leveraging LLMs to generate diverse phishing variants that can be used for data augmentation, harnessing the power of LLMs to enhance phishing detection, and paving the way for more robust and adaptable threat detection systems.
翻译:随着大型语言模型(LLM)的兴起,钓鱼邮件的威胁日益升级且日趋复杂。攻击者利用LLM制作更具说服力和规避性的钓鱼邮件,评估现有钓鱼防御机制的韧性变得至关重要。本研究对传统钓鱼检测器(如Gmail垃圾邮件过滤器、Apache SpamAssassin和Proofpoint)以及机器学习模型(如SVM、逻辑回归和朴素贝叶斯)在识别传统钓鱼邮件和LLM重述钓鱼邮件方面的性能进行了全面评估。我们还探讨了LLM作为钓鱼检测工具的新兴角色——该方法已被NTT Security Holdings和摩根大通等公司采用。结果显示,所有检测器对重述邮件的检测准确率均显著下降,这揭示了当前钓鱼防御体系的关键弱点。随着威胁态势的演变,我们的研究结果强调需要实施更强的安全控制措施,并对LLM生成内容进行监管,以防止其被滥用于制造高级钓鱼攻击。本研究通过以下方式推动更有效的网络威胁情报(CTI)发展:利用LLM生成多样化的钓鱼变体以进行数据增强;借助LLM的能力提升钓鱼检测水平;并为构建更稳健、适应性更强的威胁检测系统铺平道路。