Leveraging Content Producer Networks and User Perception to Detect Online Discursive Communities

Online discussions are often characterized by strong behavioral asymmetries: a relatively small fraction of users actively produces content, while the majority primarily consumes and redistributes it. Here we propose a community-detection framework for online social networks that exploits this asymmetry by first identifying and clustering a set of leading users, and then extending the resulting labels to the broader user base. We introduce two complementary strategies to cluster leaders, one based on their mutual interactions and the other on audience overlap, both relying on entropy-based filtering to separate signal from noise. We evaluate the framework on three major Italian political debates on Twitter/X, using public figures--identified through the pre-2022 verification system--as leaders, and known affiliations of political actors as ground truth labels. Compared with standard baselines, the proposed approach yields more coherent and interpretable communities aligned with political structures, with the two variants respectively recovering parties and coalitions. Activity-based criteria for selecting leaders produce qualitatively similar but consistently weaker results, particularly at the coalition level. Overall, our findings show that creating statistically validated networks of publicly recognized figures, whose off-platform roles constrain and stabilize their online behavior, provide a strong basis to identify discursive communities on social media. Although developed for Twitter/X, the approach is conceptually general, as it leverages structural asymmetries common to many online platforms.

翻译：在线讨论常表现出显著的行为不对称性：相对少数用户主动生产内容，而大多数用户主要消费并传播这些内容。本文提出一种在线社交网络的社区检测框架，该框架通过首先识别并聚类一组引领用户，然后将所得标签扩展至更广泛的用户群体，从而利用这种不对称性。我们引入两种互补的策略来聚类引领者：一种基于引领者间的相互互动，另一种基于受众重叠度，两种策略均依赖基于熵的过滤方法来分离信号与噪声。我们在Twitter/X平台上三个重要的意大利政治辩论场景中评估该框架，使用通过2022年前验证系统识别的公众人物作为引领者，并以政治参与者的已知从属关系作为真实标签。与标准基线方法相比，所提出的方法能产生更连贯、可解释且与政治结构对齐的社区，其两种变体分别能识别出政党与联盟层面的结构。基于活跃度标准选择引领者会产生定性相似但持续较弱的结果，尤其在联盟层面。总体而言，我们的研究结果表明：通过构建经过统计验证的、具有公众认可度的个体网络——这些个体在平台外的角色约束并稳定了其在线行为——能为识别社交媒体上的话语社区提供坚实基础。尽管该方法针对Twitter/X平台开发，但其概念具有普适性，因为它利用了众多在线平台共有的结构不对称性。