Social media platforms are often blamed for exacerbating political polarization and worsening public dialogue. Many claim hyperpartisan users post pernicious content, slanted to their political views, inciting contentious and toxic conversations. However, what factors, actually contribute to increased online toxicity and negative interactions? In this work, we explore the role that political ideology plays in contributing to toxicity both on an individual user level and a topic level on Twitter. To do this, we train and open-source a DeBERTa-based toxicity detector with a contrastive objective that outperforms the Google Jigsaw Persective Toxicity detector on the Civil Comments test dataset. Then, after collecting 187 million tweets from 55,415 Twitter users, we determine how several account-level characteristics, including political ideology and account age, predict how often each user posts toxic content. Running a linear regression, we find that the diversity of views and the toxicity of the other accounts with which that user engages has a more marked effect on their own toxicity. Namely, toxic comments are correlated with users who engage with a wider array of political views. Performing topic analysis on the toxic content posted by these accounts using the large language model MPNet and a version of the DP-Means clustering algorithm, we find similar behavior across 6,592 individual topics, with conversations on each topic becoming more toxic as a wider diversity of users become involved.
翻译:社交媒体平台常被指责加剧政治极化并恶化公共对话。许多人声称,极端党派用户发布带有政治偏向的有害内容,引发争议和有毒对话。然而,哪些因素实际上导致了在线毒性和负面互动的增加?在这项工作中,我们探讨了政治意识形态在Twitter上对个体用户层面和话题层面毒性产生的作用。为此,我们训练并开源了一个基于DeBERTa的毒性检测器,采用对比学习目标,在Civil Comments测试数据集上表现优于谷歌Jigsaw Perspective毒性检测器。随后,在收集了来自55,415名Twitter用户的1.87亿条推文后,我们确定了包括政治意识形态和账户年龄在内的若干账户级特征如何预测每位用户发布有毒内容的频率。通过线性回归分析,我们发现用户所接触的其他账户的多样性及其毒性对其自身毒性有更显著的影响。具体而言,有毒评论与那些接触更广泛政治观点的用户相关。利用大型语言模型MPNet和DP-Means聚类算法的变体对这些账户发布的有毒内容进行主题分析,我们发现在6,592个独立主题中呈现出相似的行为:随着更多样化用户的参与,每个主题上的对话变得更加有毒。