Cybercrime Bitcoin Revenue Estimations: Quantifying the Impact of Methodology and Coverage

Multiple works have leveraged the public Bitcoin ledger to estimate the revenue cybercriminals obtain from their victims. Estimations focusing on the same target often do not agree, due to the use of different methodologies, seed addresses, and time periods. These factors make it challenging to understand the impact of their methodological differences. Furthermore, they underestimate the revenue due to the (lack of) coverage on the target's payment addresses, but how large this impact remains unknown. In this work, we perform the first systematic analysis on the estimation of cybercrime bitcoin revenue. We implement a tool that can replicate the different estimation methodologies. Using our tool we can quantify, in a controlled setting, the impact of the different methodology steps. In contrast to what is widely believed, we show that the revenue is not always underestimated. There exist methodologies that can introduce huge overestimation. We collect 30,424 payment addresses and use them to compare the financial impact of 6 cybercrimes (ransomware, clippers, sextortion, Ponzi schemes, giveaway scams, exchange scams) and of 141 cybercriminal groups. We observe that the popular multi-input clustering fails to discover addresses for 40% of groups. We quantify, for the first time, the impact of the (lack of) coverage on the estimation. For this, we propose two techniques to achieve high coverage, possibly nearly complete, on the DeadBolt server ransomware. Our expanded coverage enables estimating DeadBolt's revenue at $2.47M, 39 times higher than the estimation using two popular Internet scan engines.

翻译：多项研究借助公共比特币账本估算网络犯罪分子从受害者处获得的收益。针对同一目标的估算结果往往存在分歧，这是由于使用了不同的方法论、种子地址和时间段。这些因素使得理解方法论差异的影响变得颇具挑战。此外，由于对目标收款地址（缺乏）覆盖范围，研究者会低估收益，但这一影响程度尚不明确。本研究首次系统分析网络犯罪比特币收益的估算问题。我们开发了一个可复现不同估算方法学的工具，通过该工具能够在受控条件下量化不同方法论步骤的影响。与普遍认知相反，我们发现收益并非总是被低估——某些方法论可能导致严重高估。我们收集了30,424个收款地址，用于比较6种网络犯罪类型（勒索软件、剪贴板劫持器、色情勒索、庞氏骗局、虚假赠送骗局、交易所诈骗）及141个网络犯罪团伙的财务影响。研究发现，流行的多输入聚类方法未能发现40%团伙的地址。首次量化了（缺乏）覆盖范围对估算的影响。为此，我们提出两种技术以实现死锁服务器勒索软件的高覆盖（可能近乎完全覆盖）。扩展覆盖范围使我们能够估算其收益为247万美元，较使用两种主流互联网扫描引擎的估算值高出39倍。