Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using GPT-4o to evaluate DeepSeek-chat across text and visual tasks, we find that a single strategy (step_by_step) absorbs 48.4% of all weight -- 3.2x the collapse observed in text-only self-evaluation -- while three visual-domain strategies receive only 9.1% combined weight. We then demonstrate a novel phenomenon we term cross-modal contagion: evaluator preferences acquired on one modality transfer to and corrupt strategy selection on another. Through a four-phase isolation training paradigm, we measure contagion coefficients and document strategy inversion -- the optimal strategy for a modality reverses after cross-modal exposure. A Phase 3 statistical validation across four evaluator configurations (N=53 total independent repetitions, 15,592 API calls) reveals a clear hierarchy: cross-model evaluation (GPT-4o, N=8) produces strong but symmetric bidirectional contagion (mean gamma_{T->V}=1.176, gamma_{V->T}=1.089, Delta=-0.088, p=0.575, Cohen's d=0.29); high round counts (DashScope, 50 rounds) cause collapse to single-strategy dominance (70% zero contagion); and self-evaluation provides near-complete immunity -- 97% of runs (N=30, DeepSeek-chat) yield exactly zero contagion (mean gamma=0.033, 95% CI [-0.031, 0.010], p=0.642, d=0.07). No evaluator condition shows statistically significant directional asymmetry. We introduce the contagion matrix indexed by evaluator identity, release the MM-EPC experimental framework, and identify cross-model evaluator architecture as the primary risk factor for preference contagion.

翻译：当AI智能体在反馈循环中使用语言模型评估自身输出时，系统性偏差随之产生。我们证明评估者偏好坍塌（EPC）在多模态设置中会被显著放大。使用GPT-4o评估DeepSeek-chat在文本与视觉任务上的表现时发现，单一策略（step_by_step）吸收了全部权重的48.4%——是纯文本自我评估中坍塌程度的3.2倍——而三个视觉域策略合计仅获得9.1%的权重。进一步，我们展示了一种称为跨模态传染的新现象：在一个模态上习得的评估者偏好会迁移至另一模态并腐蚀其策略选择。通过四阶段隔离训练范式，我们测量了传染系数并记录了策略反转现象——即模态最优策略在跨模态暴露后发生逆转。阶段3的统计验证（覆盖四种评估者配置，总计53次独立重复实验，15,592次API调用）揭示了清晰的层级结构：跨模型评估（GPT-4o，N=8）产生强对称双向传染（均值γ_T->V=1.176，γ_V->T=1.089，差值Δ=-0.088，p=0.575，Cohen's d=0.29）；高轮次数（DashScope，50轮）导致策略单一主导性坍塌（70%零传染）；而自我评估呈现近乎完全的免疫性——97%的运行次数（N=30，DeepSeek-chat）显示零传染（均值γ=0.033，95%置信区间[-0.031, 0.010]，p=0.642，d=0.07）。所有评估者条件均未呈现统计显著的方向性不对称。我们引入以评估者身份索引的传染矩阵，发布MM-EPC实验框架，并指出跨模型评估架构是偏好传染的主要风险因素。