A Longitudinal Study of Recently Observed Malicious Domains: Characteristics, Infrastructure, and Abuse Patterns

We present a longitudinal study of approximately 1.52 million malicious domains observed on VirusTotal (VT) between January and May 2026. Domains were selected on the basis of detection by at least five independent VT scanning engines and a first-seen date within the study window. We group the dataset into compromised domains and attacker created domains, which account for approximately 89.3% of the dataset. Combining WHOIS registration records and passive DNS (PDNS) data with the VT dataset, we characterise attacker behaviour across eight dimensions: temporal distribution, compromisedvs.attack classification, domain age at first detection, registrar and TLD preferences, DNS query volume as a damage proxy, hosting infrastructure concentration (IP and ASN level), bulk registration patterns, and brand impersonation. Key findings include: the majority of attacker created domains are short lived registrations used within weeks of creation; a small number of registrars and TLDs account for most abuse; Cloudflare infrastructure is heavily exploited for domain fronting; bulk registration events involving thousands of domains from a single registrar on a single day are widespread; and several global brands, particularly WhatsApp and Google, are heavily impersonated. We share the annotated dataset in the GitHub repo https://github.com/mufimash/malicious_domains for further research.

翻译：我们针对2026年1月至5月期间在VirusTotal（VT）上观察到的约152万个恶意域名进行了纵向研究。域名筛选标准为：至少被5个独立VT扫描引擎检测到，且首次出现日期在研究窗口内。我们将数据集分为受感染域名和攻击者创建的域名，后者约占数据集的89.3%。结合WHOIS注册记录、被动DNS（PDNS）数据与VT数据集，我们从八个维度刻画攻击者行为：时间分布、受感染与攻击分类、首次检测时的域名年龄、注册商与顶级域名偏好、作为损害代理的DNS查询量、托管基础设施集中度（IP与ASN层级）、批量注册模式以及品牌仿冒。主要发现包括：大多数攻击者创建的域名注册后数周即短暂存活；少数注册商与顶级域名构成了大部分滥用行为；Cloudflare基础设施被广泛用于域名前置攻击；涉及单个注册商在单日内注册数千域名的批量注册事件普遍存在；多个全球品牌（尤其是WhatsApp和Google）遭受严重仿冒。我们将标注数据集共享至GitHub仓库 https://github.com/mufimash/malicious_domains 以供进一步研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《基于动态图神经网络的恶意软件检测》

专知会员服务

16+阅读 · 1月28日

《动态网络环境下基于软件定义网络的分布式侦察欺骗》最新190页

专知会员服务

24+阅读 · 2024年6月3日

《使用静态污点分析检测恶意代码》CMU最新30页slides

专知会员服务

22+阅读 · 2023年10月11日

【AI+军事】附论文《从普通文本到网络威胁情报--利用自然语言处理收集网络威胁情报的技术解决方案》

专知会员服务

65+阅读 · 2022年4月26日