Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platform-led moderation as a countermeasure. In this paper, we propose a user-centric view of "jailbreaking" as an emergent, non-violent de-escalation practice. Online users engage with suspected LLM-powered accounts to circumvent large language model safeguards, exposing automated behaviour and disrupting the circulation of misleading narratives.
翻译:大型语言模型加剧了社交媒体上政治话语的规模与策略性操纵,导致冲突升级。现有研究主要将平台主导的内容审核作为应对措施。本文提出一种以用户为中心的"越狱"视角,将其视为一种新兴的非暴力降级实践。在线用户通过与疑似由LLM驱动的账户互动,绕过大型语言模型的安全防护机制,从而暴露自动化行为并阻断误导性叙事的传播循环。