Rising cyber threats, with miscreants registering thousands of new domains daily for Internet-scale attacks like spam, phishing, and drive-by downloads, emphasize the need for innovative detection methods. This paper introduces a cutting-edge approach for identifying suspicious domains at the onset of the registration process. The accompanying data pipeline generates crucial features by comparing new domains to registered domains,emphasizing the crucial similarity score. Leveraging a novel combination of Natural Language Processing (NLP) techniques, including a pretrained Canine model, and Multilayer Perceptron (MLP) models, our system analyzes semantic and numerical attributes, providing a robust solution for early threat detection. This integrated approach significantly reduces the window of vulnerability, fortifying defenses against potential threats. The findings demonstrate the effectiveness of the integrated approach and contribute to the ongoing efforts in developing proactive strategies to mitigate the risks associated with illicit online activities through the early identification of suspicious domain registrations.
翻译:随着网络威胁日益加剧,不法分子每天注册数千个新域名用于垃圾邮件、钓鱼攻击及“路过式下载”等大规模网络攻击,亟需创新的检测方法。本文提出一种前沿方法,能够在注册流程初期识别可疑域名。配套的数据管道通过对比新域名与已注册域名,生成关键特征,并重点强调相似度评分。该系统融合自然语言处理(NLP)技术(包括预训练的Canine模型)与多层感知器(MLP)模型,分析语义与数值属性,为早期威胁检测提供稳健解决方案。这种集成方法显著缩短了漏洞暴露窗口,增强了对潜在威胁的防御能力。研究结果验证了集成方法的有效性,并通过早期识别可疑域名注册,为制定主动策略以降低非法在线活动相关风险的持续工作做出了贡献。