Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models

Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate state-of-the-art visual similarity-based anti-phishing models using a large-scale dataset of 450K real-world phishing websites. Our analysis reveals that while certain models maintain high accuracy, others exhibit notably lower performance than results on curated datasets, highlighting the importance of real-world evaluation. In addition, we observe the real-world tactic of manipulating visual components that phishing attackers employ to circumvent the detection systems. To assess the resilience of existing models against adversarial attacks and robustness, we apply visible and perturbation-based manipulations to website logos, which adversaries typically target. We then evaluate the models' robustness in handling these adversarial samples. Our findings reveal vulnerabilities in several models, emphasizing the need for more robust visual similarity techniques capable of withstanding sophisticated evasion attempts. We provide actionable insights for enhancing the security of phishing defense systems, encouraging proactive actions. To the best of our knowledge, this work represents the first large-scale, systematic evaluation of visual similarity-based models for phishing detection in real-world settings, necessitating the development of more effective and robust defenses.

翻译：钓鱼攻击对互联网用户构成重大威胁，网络犯罪分子通过精心仿冒合法网站的视觉外观来欺骗受害者。基于视觉相似性的检测系统已成为一种有效的应对措施，但其在实际场景中的效能与鲁棒性尚未得到充分探索。本文利用包含45万个真实钓鱼网站的大规模数据集，对最先进的基于视觉相似性的反钓鱼模型进行了全面审视与评估。我们的分析表明，尽管部分模型保持了较高准确率，但其他模型在真实数据集上的性能明显低于经过筛选的数据集结果，这凸显了真实场景评估的重要性。此外，我们观察到钓鱼攻击者为规避检测系统所采用的实际视觉组件操纵策略。为评估现有模型对抗对抗性攻击的韧性与鲁棒性，我们对网站标识（攻击者通常针对的目标）实施了可见及基于扰动的操纵，进而评估模型处理这些对抗样本的鲁棒性。研究结果揭示了多个模型存在的脆弱性，强调需要开发能够抵御复杂规避尝试的、更鲁棒的视觉相似性技术。我们为提升钓鱼防御系统的安全性提供了可操作的见解，以鼓励采取主动防御措施。据我们所知，本研究首次在真实场景中对基于视觉相似性的钓鱼检测模型进行了大规模系统性评估，这为开发更高效、更鲁棒的防御体系提供了必要性依据。