The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset

The discourse around conspiracy theories is currently thriving amidst the rampant misinformation prevalent in online environments. Research in this field has been focused on detecting conspiracy theories on social media, often relying on limited datasets. In this study, we present a novel methodology for constructing a Twitter dataset that encompasses accounts engaged in conspiracy-related activities throughout the year 2022. Our approach centers on data collection that is independent of specific conspiracy theories and information operations. Additionally, our dataset includes a control group comprising randomly selected users who can be fairly compared to the individuals involved in conspiracy activities. This comprehensive collection effort yielded a total of 15K accounts and 37M tweets extracted from their timelines. We conduct a comparative analysis of the two groups across three dimensions: topics, profiles, and behavioral characteristics. The results indicate that conspiracy and control users exhibit similarity in terms of their profile metadata characteristics. However, they diverge significantly in terms of behavior and activity, particularly regarding the discussed topics, the terminology used, and their stance on trending subjects. Interestingly, there is no significant disparity in the presence of bot users between the two groups, suggesting that conspiracy and automation are orthogonal concepts. Finally, we develop a classifier to identify conspiracy users using 93 features, some of which are commonly employed in literature for troll identification. The results demonstrate a high accuracy level (with an average F1 score of 0.98%), enabling us to uncover the most discriminative features associated with conspiracy-related accounts.

翻译：当前，在线环境中普遍存在的错误信息使得围绕阴谋论的讨论日益活跃。该领域的研究聚焦于在社交媒体上检测阴谋论，但往往依赖于有限的数据集。在本研究中，我们提出了一种新颖的方法，用于构建涵盖2022年全年参与阴谋相关活动的推特账户数据集。我们的方法核心在于数据收集独立于特定的阴谋论和信息操作。此外，我们的数据集包含一个由随机选取用户组成的对照组，这些用户可与参与阴谋活动的个体进行公平比较。这一综合收集工作最终获得了总计1.5万个账户及其时间线上提取的3700万条推文。我们从主题、档案和行为特征三个维度对两组进行了比较分析。结果表明，阴谋用户与对照用户在档案元数据特征上表现出相似性，但在行为和活动方面存在显著差异，尤其是在讨论话题、使用术语以及对热门话题的立场上。有趣的是，两组之间在机器人用户的存在上并无显著差异，这表明阴谋与自动化是两个正交的概念。最后，我们开发了一个分类器，利用93个特征（其中部分特征常见于文献中用于识别网络喷子）来识别阴谋用户。结果显示，该分类器达到了高准确率（平均F1分数为0.98%），从而揭示了与阴谋相关账户最具区分度的特征。