The interaction between fringe subcultures and mainstream online communities poses significant challenges for understanding discourse on social media. In this work, we investigate whether users active in conspiracy-focused communities exhibit detectable linguistic signatures when participating in general-interest spaces, such as news, humor, or hobbyist forums. We analyze a large-scale longitudinal dataset of over 500 million comments spanning 10 years of Reddit activity, examining the communication patterns of these users across diverse social contexts independent of the topics they discuss. We show that these users exhibit distinctive linguistic patterns that enable machine learning models to reliably distinguish them from the general population within individual communities (averaging 87\% accuracy across more than 20 binary classification tasks). Crucially, no single aggregate model captures these patterns across communities, as community-specific models outperform global classifiers by up to 17 percentage points. This result suggests that while these users are distinct, their linguistic expression is dynamic and highly responsive to the social norms of the environment they inhabit. Our findings suggest the need for tailored interventions in online spaces, as linguistic signals associated with conspiracy and fringe subcultures vary across communities and cannot be effectively addressed by uniform detection or moderation strategies.
翻译:边缘亚文化与主流在线社区之间的互动对理解社交媒体话语构成了重大挑战。本研究探讨了活跃于阴谋论社区的用戶在参与一般兴趣空间(如新闻、幽默或爱好论坛)时,是否表现出可检测的语言特征。我们分析了一个跨越10年Reddit活动、包含超过5亿条评论的大规模纵向数据集,考察这些用户在不同社会语境中独立于其讨论主题的沟通模式。研究表明,这些用户展现出独特的语言模式,使得机器学习模型能够在个体社区内可靠地将他们与普通用户区分开来(在超过20个二分类任务中平均准确率达87%)。关键在于,没有一个单一的聚合模型能够跨社区捕捉这些模式——社区特定模型的表现比全局分类器高出多达17个百分点。这一结果表明,尽管这些用户具有独特性,但其语言表达具有动态性,且高度适应所处环境的社会规范。我们的发现提示在线空间需要采取针对性干预措施,因为与阴谋论及边缘亚文化相关的语言信号在不同社区间存在差异,无法通过统一的检测或审核策略有效应对。