Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish,Arabic, and Dutch, covering 80 topics such as Culture, Politics, and News. We thoroughly analyze how toxicity spikes within different communities in relation to specific topics. We observe consistent patterns of increased toxicity across languages for certain topics, while also noting significant variations within specific language communities.
翻译:摘要:有毒语言仍是社交媒体平台上的持续挑战,对用户和社区构成重大影响。本文对Reddit对话中的毒性进行了跨主题和跨语言分析。我们收集了来自481个社区的150万条评论串,涉及六种语言:英语、德语、西班牙语、土耳其语、阿拉伯语和荷兰语,涵盖80个主题,如文化、政治和新闻。我们深入分析了不同社区中特定主题相关的毒性激增现象。观察到某些主题在不同语言中呈现一致的毒性增加模式,同时注意到特定语言社区内存在显著差异。