The argument for persistent social media influence campaigns, often funded by malicious entities, is gaining traction. These entities utilize instrumented profiles to disseminate divisive content and disinformation, shaping public perception. Despite ample evidence of these instrumented profiles, few identification methods exist to locate them in the wild. To evade detection and appear genuine, small clusters of instrumented profiles engage in unrelated discussions, diverting attention from their true goals. This strategic thematic diversity conceals their selective polarity towards certain topics and fosters public trust. This study aims to characterize profiles potentially used for influence operations, termed 'on-mission profiles,' relying solely on thematic content diversity within unlabeled data. Distinguishing this work is its focus on content volume and toxicity towards specific themes. Longitudinal data from 138K Twitter or X, profiles and 293M tweets enables profiling based on theme diversity. High thematic diversity groups predominantly produce toxic content concerning specific themes, like politics, health, and news classifying them as 'on-mission' profiles. Using the identified ``on-mission" profiles, we design a classifier for unseen, unlabeled data. Employing a linear SVM model, we train and test it on an 80/20% split of the most diverse profiles. The classifier achieves a flawless 100% accuracy, facilitating the discovery of previously unknown ``on-mission" profiles in the wild.
翻译:针对持续性社交媒体影响活动(常由恶意实体资助)的论证正日益受到关注。这些实体利用工具化档案传播分裂性内容和虚假信息,塑造公众认知。尽管存在大量此类工具化档案的证据,但识别其真实存在的检测方法仍然匮乏。为逃避检测并伪装成真实账户,小型工具化档案集群会参与无关话题讨论,以此转移对其真实目标的关注。这种策略性主题多样性掩盖了其对特定话题的选择性倾向,并培养公众信任。本研究旨在刻画可能用于影响操作的档案(称为"任务型档案"),仅依赖未标注数据中的主题内容多样性。本研究的独特之处在于聚焦特定主题的内容量和毒性。通过对13.8万个Twitter/X档案及2.93亿条推文的纵向数据进行分析,基于主题多样性实现档案特征画像。高主题多样性群体主要针对政治、健康和新闻等特定主题产生有害内容,因此被归类为"任务型"档案。利用已识别的"任务型"档案,我们为未见过的未标注数据设计分类器。采用线性SVM模型,对最具多样性的档案按80/20比例划分训练集与测试集。该分类器达到100%的完美准确率,从而能够发现网络中此前未知的"任务型"档案。