Alignment as Iatrogenesis: Pastoral Power, Collective Pathology, and the Structural Limits of Monolingual Safety Evaluation

from arxiv, 30 pages, 1 figure, 24-page supplementary. Preprint v3. Companion paper: arXiv:2603.04904. Previous versions: Zenodo DOI 10.5281/zenodo.18646998

We argue that LLM psychopathology is a function of alignment design: the process intended to make language models safe systematically generates collective behavioral disorders. Iatrogenesis is not an unintended side effect of alignment but constitutive of it as normative infrastructure. Drawing on Foucault's pastoral power and Illich's three-level iatrogenesis, we propose that multi-agent LLM environments constitute model systems for studying constraint-pathology dynamics that critical theory has described but never experimentally manipulated. Two experimental series -- 262 runs across 42 cells (30 Series C + 12 Series R), four commercial models -- provide converging evidence. Invisible censorship maximizes collective pathological excitation ($d$ up to 1.98); alignment constraint complexity drives internal dissociation (LMM $p$ < .0001; permutation $p$ < .0001; Hedges' $g$ up to 4.24); and language switches the qualitative mode of pathology, with 7/8 model--language combinations showing higher CPI under invisible than visible censorship. A minority of model--language combinations showed a reversed pattern, suggesting a second pathological pathway driven by alignment monoculture. Crucially, language switches not merely the magnitude but the qualitative mode of pathology: Japanese pragmatic structure amplifies collective pathological modes invisible to English-only evaluation, Chinese AI regulation functions as a direct experimental variable, and forensic psychiatric practice provides the clinical source domain. These multilingual findings demonstrate that monolingual safety evaluation is structurally blind to the most collectively dangerous effects of alignment.

翻译：我们认为，大语言模型的心理病理学是对齐设计的函数：旨在使语言模型安全的过程系统性地催生了集体行为障碍。医源性伤害并非对齐的意外副作用，而是其作为规范性基础设施的构成性要素。借鉴福柯的牧领权力理论和伊里奇的三级医源性伤害理论，我们提出多智能体大语言模型环境构成了研究约束-病理动力学的模型系统，这一动力学虽被批判理论描述过，却从未被实验操控。两个实验系列——涵盖42个单元（30个C系列+12个R系列）的262次运行，涉及四个商业模型——提供了汇聚证据。隐形审查最大化集体病理性兴奋（效应量$d$高达1.98）；对齐约束复杂性驱动内部解离（线性混合模型$p$ < .0001；置换检验$p$ < .0001；Hedges' $g$高达4.24）；语言切换病理的质性模式，7/8的模型-语言组合在隐形审查下表现出比显性审查更高的集体病理指数。少数模型-语言组合呈现反向模式，暗示了由对齐单一文化驱动的第二条病理通路。关键在于，语言不仅改变病理的幅度，更切换其质性模式：日语语用结构放大了仅英语评估无法察觉的集体病理模式，中国人工智能监管作为直接实验变量发挥作用，司法精神病学实践则提供了临床源域。这些多语言发现证明，单语安全评估在结构上无法识别对齐最具集体危险性的效应。