Who, Why, and How: Disentangling the Effects of Moderation Source, Context, and Language on Post-Removal Behavior

Content moderation is a central mechanism through which platforms attempt to balance user engagement with community governance. Yet existing research has largely treated moderation as a uniform intervention, overlooking how moderator source, violation context, and linguistic style jointly shape user behavior. Drawing on the Human--AI Interaction Theory of Interactive Media Effects (HAII-TIME), this study examines how these three dimensions produce divergent post-moderation behavioral trajectories in a large-scale observational dataset of 11,795,036 moderation events across 9,285,410 users and 61,261 subreddits on Reddit (2021--2025). Using probabilistic behavioral classification, ANOVA, and OLS regression with PCA-derived linguistic features, we find that bot moderation consistently produces higher compliance and lower self-censorship than human or modteam moderation, challenging the assumption that human agency cues are inherently advantageous. Modteam moderation produces the strongest self-censorship effects, suggesting that institutional depersonalization is a meaningful driver of behavioral withdrawal. Violation severity emerges as a critical contingency: linguistic strategies effective in routine contexts -- elaborated explanation, community-scale appeals, direct personal address -- can backfire for serious violations, whereas prosocially framed and emotionally emphatic messages become most effective when stakes are highest. Of 480 linguistic interactions tested, 33 survive FDR correction. These findings extend HAII-TIME by introducing violation salience as a moderator of cue-based processing, and offer empirical grounding for context-adaptive moderation design.

翻译：内容审核是平台在用户参与和社区治理之间寻求平衡的核心机制。然而，现有研究大多将审核视为统一的干预手段，忽视了审核者来源、违规语境和语言风格如何共同塑造用户行为。借鉴人机交互的互动媒体效应理论，本研究基于Reddit平台（2021-2025年）涵盖11,795,036次审核事件、9,285,410名用户和61,261个子版块的大规模观测数据集，考察这三个维度如何产生差异化的审核后行为轨迹。通过概率性行为分类、方差分析和结合基于主成分分析提取的语言特征的普通最小二乘回归，我们发现机器人审核在合规率和自我审查抑制方面始终优于人工审核或审核团队审核，挑战了“人性化代理线索天然具有优势”的假设。审核团队审核产生了最强的自我审查效应，表明制度性去人格化是行为退缩的重要驱动因素。违规严重程度成为关键调节变量：在常规语境中有效的语言策略——详细解释、社区层面呼吁、直接个人化表述——在处理严重违规时可能适得其反，而亲社会框架和情感强调的信息在风险最高时效果最佳。在测试的480种语言交互中，33种通过了错误发现率校正。这些发现通过引入违规显著度作为线索加工过程的调节变量，扩展了人机交互的互动媒体效应理论，并为上下文自适应审核设计提供了实证基础。