In the pursuit of an effective spam detection system, the focus has often been on identifying known spam patterns either through rule-based detection systems or machine learning (ML) solutions that rely on keywords. However, both systems are susceptible to evasion techniques and zero-day attacks that can be achieved at low cost. Therefore, an email that bypassed the defense system once can do it again in the following days, even though rules are updated or the ML models are retrained. The recurrence of failures to detect emails that exhibit layout similarities to previously undetected spam is concerning for customers and can erode their trust in a company. Our observations show that threat actors reuse email kits extensively and can bypass detection with little effort, for example, by making changes to the content of emails. In this work, we propose an email visual similarity detection approach, named Pisco, to improve the detection capabilities of an email threat defense system. We apply our proof of concept to some real-world samples received from different sources. Our results show that email kits are being reused extensively and visually similar emails are sent to our customers at various time intervals. Therefore, this method could be very helpful in situations where detection engines that rely on textual features and keywords are bypassed, an occurrence our observations show happens frequently.
翻译:在构建有效的垃圾邮件检测系统时,研究重点通常是通过基于规则的检测系统或依赖关键词的机器学习(ML)解决方案来识别已知的垃圾邮件模式。然而,这两种系统都容易受到低成本实现的规避技术和零日攻击的影响。因此,一封曾绕过防御系统的电子邮件,即使在规则更新或机器学习模型重新训练后,仍可能在后续几天内再次成功规避。对于客户而言,检测系统反复未能识别出与先前未检测到的垃圾邮件具有布局相似性的电子邮件是令人担忧的,并可能削弱他们对公司的信任。我们的观察表明,威胁行为者广泛重复使用电子邮件工具包,并可以轻松绕过检测,例如通过更改电子邮件内容。在本工作中,我们提出了一种名为Pisco的电子邮件视觉相似性检测方法,旨在提高电子邮件威胁防御系统的检测能力。我们将概念验证应用于从不同来源接收的一些真实样本。我们的结果表明,电子邮件工具包被广泛重复使用,并且在不同的时间间隔向我们的客户发送了视觉上相似的电子邮件。因此,在依赖文本特征和关键词的检测引擎被绕过的情况下——我们的观察表明这种情况经常发生——该方法可能非常有帮助。