Toxicity in digital media poses significant challenges, yet little attention has been given to its dynamics within the rapidly growing medium of podcasts. This paper addresses this gap by analyzing political podcast data to study the emergence and propagation of toxicity, focusing on conversation chains-structured reply patterns within podcast transcripts. Leveraging state-of-the-art transcription models and advanced conversational analysis techniques, we systematically examine toxic discourse in over 30 popular political podcasts in the United States. Our key contributions include: (1) creating a comprehensive dataset of transcribed and diarized political podcasts, identifying thousands of toxic instances using Google's Perspective API, (2) uncovering concerning trends where a majority of episodes contain at least one toxic instance, (3) introducing toxic conversation chains and analyzing their structural and linguistic properties, revealing characteristics such as longer durations, repetitive patterns, figurative language, and emotional cues tied to anger and annoyance, (4) identifying demand-related words like 'want', 'like', and 'know' as precursors to toxicity, and (5) developing predictive models to anticipate toxicity shifts based on annotated change points. Our findings provide critical insights into podcast toxicity and establish a foundation for future research on real-time monitoring and intervention mechanisms to foster healthier discourse in this influential medium.
翻译:数字媒体中的毒性内容构成了重大挑战,然而在快速发展的播客媒介中,其动态演变却鲜受关注。本文通过分析政治播客数据来研究毒性的产生与传播,从而填补这一研究空白,重点关注播客转录文本中具有结构性的回复模式——对话链。我们利用前沿的转录模型与先进的对话分析技术,系统性地研究了美国30余档热门政治播客中的毒性话语。本研究的主要贡献包括:(1)构建了包含转录及说话人标注的政治播客综合数据集,使用谷歌Perspective API识别出数千个毒性实例;(2)揭示了令人担忧的趋势:大多数剧集至少包含一个毒性实例;(3)提出毒性对话链概念并分析其结构与语言特性,发现其具有持续时间更长、重复模式、比喻性语言以及与愤怒、恼怒相关的情绪线索等特征;(4)识别出"想要"、"喜欢"、"知道"等需求相关词汇作为毒性的前兆;(5)开发了基于标注变化点的毒性转向预测模型。我们的研究结果为理解播客毒性提供了关键见解,并为未来在这一重要媒介中实现实时监测与干预机制、促进更健康的话语生态奠定了研究基础。