Online harassment is a widespread social and public health concern, yet most computational approaches for detecting and addressing harassment focus on publicly visible social media content rather than private messaging environments. Private conversations present unique challenges because harmful interactions often unfold through context-dependent, multi-turn exchanges, while victims may lack timely support during moments of harassment. In this study, we investigate how large language models (LLMs) can support both the detection of and response to online harassment in private messaging. Using a dataset of 80,053 Instagram direct messages donated by 26 adolescents aged 12-18, including youth with suicide risk factors, we first construct a human-labeled dataset of online harassment in private conversations and develop a context-aware cascading LLM classification pipeline. The proposed pipeline outperforms baseline toxicity classifiers trained primarily on public social media data. We then develop a victim-centered response framework that produces context-sensitive and psychologically-grounded AI-generated responses to online harassment messages. Human evaluators perceived the AI-generated responses as significantly more helpful than the original participant responses (95% CI: 0.767--0.815, p < .001), particularly in terms of emotional support and de-escalation. Our findings highlight the potential of context-aware and victim-centered AI systems to provide just-in-time support during harassment in private messaging environments.
翻译:网络霸凌是一种广泛存在的社会与公共健康问题,然而当前大多数用于检测和应对霸凌的计算方法主要集中在公开可见的社交媒体内容上,而非私人消息环境。私人对话因其上下文依赖、多轮交互的特性,使得有害互动往往在此类情境中逐步展开,而受害者在遭受霸凌时可能缺乏及时的支持。本研究探讨了大型语言模型如何在私人消息中支持网络霸凌的检测与回应。基于由26名12-18岁青少年(包括具有自杀风险因素的青少年)提供的80,053条Instagram私信数据集,我们首先构建了一个针对私人对话中网络霸凌的人工标注数据集,并开发了一套情境感知的级联LLM分类流程。所提流程的性能优于主要基于公开社交媒体数据训练的基础毒性分类器。随后,我们开发了一个以受害者为中心的回复生成框架,能够产生情境敏感且基于心理学的AI生成回复以应对网络霸凌消息。人工评估者认为AI生成的回复在情感支持与缓和冲突方面显著优于原始参与者回复(95%置信区间:0.767–0.815,p < .001)。研究结果凸显了情境感知与以受害者为中心的AI系统在私人消息环境中为霸凌事件提供即时支持的潜力。