With nearly 2.5m users, onion services have become the prominent part of the darkweb. Over the last five years alone, the number of onion domains has increased 20x, reaching more than 700k unique domains in January 2022. As onion services host various types of illicit content, they have become a valuable resource for darkweb research and an integral part of e-crime investigation and threat intelligence. However, this content is largely un-indexed by today's search engines and researchers have to rely on outdated or manually-collected datasets that are limited in scale, scope, or both. To tackle this problem, we built Dizzy: An open-source crawling and analysis system for onion services. Dizzy implements novel techniques to explore, update, check, and classify onion services at scale, without overwhelming the Tor network. We deployed Dizzy in April 2021 and used it to analyze more than 63.3m crawled onion webpages, focusing on domain operations, web content, cryptocurrency usage, and web graph. Our main findings show that onion services are unreliable due to their high churn rate, have a relatively small number of reachable domains that are often similar and illicit, enjoy a growing underground cryptocurrency economy, and have a graph that is relatively tightly-knit to, but topologically different from, the regular web's graph.
翻译:随着近250万用户的使用,洋葱服务已成为暗网的重要组成部分。仅在过去五年间,洋葱域名数量增长了20倍,截至2022年1月已超过70万个独立域名。由于洋葱服务托管各类非法内容,它们已成为暗网研究的重要资源,并成为电子犯罪调查与威胁情报的关键组成部分。然而,当前搜索引擎对这些内容的索引严重不足,研究者不得不依赖规模或范围受限的过时数据集或人工采集数据。为解决此问题,我们构建了Dizzy:一套面向洋葱服务的开源爬取与分析系统。Dizzy实现了新型技术,可在不压垮Tor网络的前提下,大规模探索、更新、检测与分类洋葱服务。我们于2021年4月部署Dizzy,并利用其分析了超过6330万次爬取的洋葱网页,重点关注域名运营、网页内容、加密货币使用及网络图谱。主要发现表明:洋葱服务因高流失率而不可靠;可访问域名数量较少且内容相似且多涉非法;地下加密货币经济持续增长;其网络图谱虽与常规网络紧密关联,但拓扑结构存在显著差异。