With nearly 2.5m users, onion services have become the prominent part of the darkweb. Over the last five years alone, the number of onion domains has increased 20x, reaching more than 700k unique domains in January 2022. As onion services host various types of illicit content, they have become a valuable resource for darkweb research and an integral part of e-crime investigation and threat intelligence. However, this content is largely un-indexed by today's search engines and researchers have to rely on outdated or manually-collected datasets that are limited in scale, scope, or both. To tackle this problem, we built Dizzy: An open-source crawling and analysis system for onion services. Dizzy implements novel techniques to explore, update, check, and classify onion services at scale, without overwhelming the Tor network. We deployed Dizzy in April 2021 and used it to analyze more than 63.3m crawled onion webpages, focusing on domain operations, web content, cryptocurrency usage, and web graph. Our main findings show that onion services are unreliable due to their high churn rate, have a relatively small number of reachable domains that are often similar and illicit, enjoy a growing underground cryptocurrency economy, and have a graph that is relatively tightly-knit to, but topologically different from, the regular web's graph.
翻译:随着近250万用户的使用,洋葱服务已成为暗网的重要组成部分。仅在过去五年间,洋葱域名数量增长了20倍,于2022年1月达到超过70万个独立域名。由于洋葱服务托管各类非法内容,它们已成为暗网研究的重要资源,并成为网络犯罪调查与威胁情报不可或缺的组成部分。然而,当前搜索引擎基本未对这些内容建立索引,研究人员不得不依赖规模或范围受限的过时数据集或手动收集数据。为解决这一问题,我们构建了Dizzy:一个面向洋葱服务的开源爬取与分析系统。Dizzy通过新颖技术实现大规模洋葱服务的探索、更新、检测与分类,同时避免对Tor网络造成过载。我们于2021年4月部署了Dizzy,并利用其分析了超过6330万次爬取的洋葱网页,重点研究域名运营、网页内容、加密货币使用及网络图结构。主要发现表明:洋葱服务因高流失率而不可靠,可访问域名数量较少且常具相似性与非法性,地下加密货币经济持续增长,其网络图结构虽与常规网络紧密关联但拓扑特征迥异。