Toxic language, such as hate speech, can deter users from participating in online communities and enjoying popular platforms. Previous approaches to detecting toxic language and norm violations have been primarily concerned with conversations from online forums and social media, such as Reddit and Twitter. These approaches are less effective when applied to conversations on live-streaming platforms, such as Twitch and YouTube Live, as each comment is only visible for a limited time and lacks a thread structure that establishes its relationship with other comments. In this work, we share the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms. We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch. We articulate several facets of live-stream data that differ from other forums, and demonstrate that existing models perform poorly in this setting. By conducting a user study, we identify the informational context humans use in live-stream moderation, and train models leveraging context to identify norm violations. Our results show that appropriate contextual information can boost moderation performance by 35\%.
翻译:仇恨言论等有毒语言会阻碍用户参与在线社区和享受热门平台。以往检测有毒语言和规范违规行为的方法主要关注来自在线论坛和社交媒体(如Reddit和Twitter)的对话。这些方法应用于直播平台(如Twitch和YouTube Live)的对话时效果较差,因为每条评论仅在有限时间内可见,且缺乏用于建立与其他评论关系的线程结构。在本工作中,我们分享了首个专门检测直播平台对话中规范违规行为的NLP研究。我们定义了直播聊天中的规范违规类别,并标注了来自Twitch的4,583条被审核评论。我们阐述了直播数据与其他论坛不同的多个方面,并证明现有模型在此场景下表现不佳。通过开展用户研究,我们识别了人类在直播审核中使用的情境信息,并训练了利用情境识别规范违规的模型。结果表明,适当的情境信息可将审核性能提升35%。