Who, Why, and How: Disentangling the Effects of Moderation Source, Context, and Language on Post-Removal Behavior

Content moderation is a central mechanism through which platforms attempt to balance user engagement with community governance. Yet existing research has largely treated moderation as a uniform intervention, overlooking how moderator source, violation context, and linguistic style jointly shape user behavior. Drawing on the Human--AI Interaction Theory of Interactive Media Effects (HAII-TIME), this study examines how these three dimensions produce divergent post-moderation behavioral trajectories in a large-scale observational dataset of 11,795,036 moderation events across 9,285,410 users and 61,261 subreddits on Reddit (2021--2025). Using probabilistic behavioral classification, ANOVA, and OLS regression with PCA-derived linguistic features, we find that bot moderation consistently produces higher compliance and lower self-censorship than human or modteam moderation, challenging the assumption that human agency cues are inherently advantageous. Modteam moderation produces the strongest self-censorship effects, suggesting that institutional depersonalization is a meaningful driver of behavioral withdrawal. Violation severity emerges as a critical contingency: linguistic strategies effective in routine contexts -- elaborated explanation, community-scale appeals, direct personal address -- can backfire for serious violations, whereas prosocially framed and emotionally emphatic messages become most effective when stakes are highest. Of 480 linguistic interactions tested, 33 survive FDR correction. These findings extend HAII-TIME by introducing violation salience as a moderator of cue-based processing, and offer empirical grounding for context-adaptive moderation design.

翻译：内容审核是平台在用户参与和社区治理之间寻求平衡的核心机制。然而，现有研究大多将审核视为统一干预，忽视了审核者来源、违规情境和语言风格如何共同塑造用户行为。本研究基于人机交互的媒体效应理论（HAII-TIME），利用Reddit平台上一组包含11,795,036次审核事件、9,285,410名用户和61,261个子版块（2021-2025年）的大规模观测数据集，考察这三个维度如何导致不同的审核后行为轨迹。通过概率行为分类、方差分析以及基于PCA衍生的语言特征进行OLS回归，我们发现机器人审核在提高合规性和降低自我审查方面始终优于人工审核或团队审核，挑战了“人类代理线索天生具有优势”的假设。团队审核产生了最强的自我审查效应，表明制度化的非人格化是导致行为退缩的重要驱动因素。违规严重程度成为一个关键调节变量：在常规情境中有效的语言策略——如详细解释、社区层面呼吁、直接针对个人——在严重违规情况下可能适得其反，而亲社会框架和情感强调的信息则在风险最高时最为有效。在测试的480种语言交互中，有33种通过了FDR校正。这些发现通过引入违规显著性作为线索加工过程的调节因素，拓展了HAII-TIME理论，并为情境自适应的审核设计提供了实证基础。