Assessing Crime Disclosure Patterns in a Large-Scale Cybercrime Forum

Cybercrime forums play a central role in the cybercrime ecosystem, serving as hubs for the exchange of illicit goods, services, and knowledge. Previous studies have explored the market and social structures of these forums, but less is known about the behavioral dynamics of users, particularly regarding participants' disclosure of criminal activity. This study provides the first large-scale assessment of crime disclosure patterns in a major cybercrime forum, analysing over 3.5 million posts from nearly 300k users. Using a three-level classification scheme (benign, grey, and crime) and a scalable labelling pipeline powered by large language models (LLMs), we measure the level of crime disclosure present in initial posts, analyse how participants switch between levels, and assess how crime disclosure behavior relates to private communications. Our results show that crime disclosure is relatively normative: one quarter of initial posts include explicit crime-related content, and more than one third of users disclose criminal activity at least once in their initial posts. At the same time, most participants show restraint, with over two-thirds posting only benign or grey content and typically escalating disclosure gradually. Grey initial posts are particularly prominent, indicating that many users avoid overt statements and instead anchor their activity in ambiguous content. The study highlights the value of LLM-based text classification and Markov chain modelling for capturing crime disclosure patterns, offering insights for law enforcement efforts aimed at distinguishing benign, grey, and criminal content in cybercrime forums.

翻译：网络犯罪论坛在网络犯罪生态系统中扮演着核心角色，是非法商品、服务和知识交换的中心枢纽。先前研究已探讨了这些论坛的市场与社会结构，但对用户行为动态，特别是参与者披露犯罪活动的行为，仍知之甚少。本研究首次对一家主要网络犯罪论坛中的犯罪披露模式进行了大规模评估，分析了来自近30万用户的超过350万条帖子。通过采用三级分类方案（良性、灰色与犯罪）以及由大型语言模型驱动的可扩展标注流程，我们测量了初始帖子中存在的犯罪披露水平，分析了参与者在不同级别间的转换行为，并评估了犯罪披露行为与私人通信之间的关系。研究结果表明，犯罪披露行为相对普遍：四分之一的初始帖子包含明确的犯罪相关内容，超过三分之一的用户在其初始帖子中至少披露过一次犯罪活动。与此同时，大多数参与者表现出克制态度，超过三分之二的用户仅发布良性或灰色内容，且通常逐步升级其披露程度。灰色初始帖子尤为突出，表明许多用户避免公开声明，而是将其活动锚定在模棱两可的内容中。本研究凸显了基于大型语言模型的文本分类与马尔可夫链建模在捕捉犯罪披露模式方面的价值，为执法部门旨在区分网络犯罪论坛中良性、灰色与犯罪内容的努力提供了重要见解。